* [PATCH v16 00/51] KVM: x86: Super Mega CET
@ 2025-09-19 22:32 Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 01/51] KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() Sean Christopherson
` (52 more replies)
0 siblings, 53 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
As the subject suggests, this series continues to grow, as there an absolutely
stupid number of edge cases and interactions.
There are (a lot) more changes between v15 and v16 than I was hoping for, but
there all fairly "minor" in the sense that it's things like disabling SHSTK
when using the shadow MMU. I.e. it's mostly "configuration" fixes, and very
few logical changes (outside of msrs_test.c, which has non-trivial changes due
to ignore_msrs, argh).
So, my plan is to still land this in 6.18. I'm going to push it to -next
today to get as much testing as possible, but in a dedicated branch so that I
can fixup as needed (e.g. even if it's just for reviews). I'll freeze the
hashes sometime next week.
I probably missed some of the changes in the log below, sorry.
P.S. I have a pile of local changes to the CET KVM-Unit-Test, I'll post them
sometime next week.
v16:
- Collect more reviews.
- Reject task switch emulation if IBT or SHSTK. [Binbin]
- Improve various comments and fix typos. [Binbin]
- Accept writes to XSS for SEV-ES guests even though that state is
"protected", as KVM needs to update its internal tracking in response to
guest changes. [John]
- Drop @ghcb from KVM's accessors so that it's harder to screw up. [Tom]
- s/KVM_X86_REG_ENCODE/KVM_X86_REG_ID. [Binbin]
- Append "cc" to cpu_has_vmx_basic_no_hw_errcode(). [Binbin]
- Use "KVM: SVM" for shortlogs. [Xin]
- Disable SHSTK if TDP is disabled (affects AMD only because Intel was already
disabling support indirectly thanks to Unrestricted Guest).
- Disable IBT and SHSTK of allow_smaller_maxphyaddr is true (Intel only
because it doesn't work on AMD with NPT).
- Rework IBT instruction detection to realy on IsBranch and the operand source
instead of having to manually inspect+tag each instruction.
- Handle the annoying #GP case for SSP[63:32] != 0 when transitioning to
compatibility mode so that FAR JMP doesn't need to be disallowed when
SHSTK is enabled (I don't anyone would care, but special casing FAR JMP for
a very solvable problem felt lazy).
- Add a define for PFERR_SS_MASK and pretty print the missing PFERR (don't ask
me how long it took to figure out why the SHSTK KUT testcase failed when
I tried to run it with npt=0).
- Advertise LOAD_CET_STATE fields for nVMX iff one of IBT or SHSTK is
supported (being able to load non-existent CET state is all kinds of weird).
- Explicitly check for SHSTK when translating MSR_KVM_INTERNAL_GUEST_SSP to
avoid running afoul of ignore_msrs.
- Add TSC_AUX to the
- Skip negative tests in msrs_tests when ignore_msrs is true (KVM's ABI, or
rather lack thereof, is truly awful).
- Remove an unnecessary round of vcpu_run() calls. [Chao]
- Use ex_str() in a few more tests. [Chao]
- Add XFEATURE_MASK_CET_ALL to simplify referencing KERNEL and USER, which
KVM always does as a pair. [Binbin]
v15:
- https://lore.kernel.org/all/20250912232319.429659-1-seanjc@google.com
- Collect reviews (hopefully I got 'em all).
- Add support for KVM_GET_REG_LIST.
- Load FPU when accessing XSTATE MSRs via ONE_REG ioctls.
- Explicitly return -EINVAL on kvm_set_one_msr() failure.
- Make is_xstate_managed_msr() more precise (check guest caps).
- Dedup guts of kvm_{g,s}et_xstate_msr() (as kvm_access_xstate_msr()).
- WARN if KVM uses kvm_access_xstate_msr() to access an MSR that isn't
managed via XSAVE.
- Document why S_CET isn't treated as an XSTATE-managed MSR.
- Mark VMCB_CET as clean/dirty as appropriate.
- Add nSVM support for the CET VMCB fields.
- Add an "msrs" selftest to coverage ONE_REG and host vs. guest accesses in
general.
- Add patches to READ_ONCE() guest-writable GHCB fields, and to check the
validity of XCR0 "writes".
- Check the validity of XSS "writes" via common MSR emulation.
- Add {CP,HV,VC,SV}_VECTOR definitions so that tracing and selftests can
pretty print them.
- Add pretty printing for unexpected exceptions in selftests.
- Tweak the emulator rejection to be more precise (grab S_CET vs. U_CET based
CPL for near transfers), and to avoid unnecessary reads of CR4, S_CET, and
U_CET.
Intel (v14): https://lkml.kernel.org/r/20250909093953.202028-1-chao.gao%40intel.com
AMD (v4): https://lore.kernel.org/all/20250908201750.98824-1-john.allen@amd.com
grsec (v3): https://lkml.kernel.org/r/20250813205957.14135-1-minipli%40grsecurity.net
Chao Gao (4):
KVM: x86: Check XSS validity against guest CPUIDs
KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET
KVM: nVMX: Add consistency checks for CET states
KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state
John Allen (4):
KVM: SVM: Emulate reads and writes to shadow stack MSRs
KVM: SVM: Update dump_vmcb with shadow stack save area additions
KVM: SVM: Pass through shadow stack MSRs as appropriate
KVM: SVM: Enable shadow stack virtualization for SVM
Mathias Krause (1):
KVM: VMX: Make CR4.CET a guest owned bit
Sean Christopherson (26):
KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to
kvm_get_cached_sw_exit_code()
KVM: SEV: Read save fields from GHCB exactly once
KVM: SEV: Validate XCR0 provided by guest in GHCB
KVM: x86: Report XSS as to-be-saved if there are supported features
KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
KVM: x86: Don't emulate instructions affected by CET features
KVM: x86: Don't emulate task switches when IBT or SHSTK is enabled
KVM: x86: Emulate SSP[63:32]!=0 #GP(0) for FAR JMP to 32-bit mode
KVM: x86/mmu: WARN on attempt to check permissions for Shadow Stack
#PF
KVM: x86/mmu: Pretty print PK, SS, and SGX flags in MMU tracepoints
KVM: nVMX: Always forward XSAVES/XRSTORS exits from L2 to L1
KVM: x86: Disable support for Shadow Stacks if TDP is disabled
KVM: x86: Disable support for IBT and SHSTK if
allow_smaller_maxphyaddr is true
KVM: VMX: Configure nested capabilities after CPU capabilities
KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
KVM: x86: Add human friendly formatting for #XM, and #VE
KVM: x86: Define Control Protection Exception (#CP) vector
KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
KVM: selftests: Add ex_str() to print human friendly name of exception
vectors
KVM: selftests: Add an MSR test to exercise guest/host and read/write
KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
KVM: selftests: Extend MSRs test to validate vCPUs without supported
features
KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
KVM: selftests: Add coverate for KVM-defined registers in MSRs test
KVM: selftests: Verify MSRs are (not) in save/restore list when
(un)supported
Yang Weijiang (16):
KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS
KVM: x86: Initialize kvm_caps.supported_xss
KVM: x86: Add fault checks for guest CR4.CET setting
KVM: x86: Report KVM supported CET MSRs as to-be-saved
KVM: VMX: Introduce CET VMCS fields and control bits
KVM: x86: Enable guest SSP read/write interface with new uAPIs
KVM: VMX: Emulate read and write to CET MSRs
KVM: x86: Save and reload SSP to/from SMRAM
KVM: VMX: Set up interception for CET MSRs
KVM: VMX: Set host constant supervisor states to VMCS fields
KVM: x86: Allow setting CR4.CET if IBT or SHSTK is supported
KVM: x86: Add XSS support for CET_KERNEL and CET_USER
KVM: x86: Enable CET virtualization for VMX and advertise to userspace
KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2
KVM: nVMX: Prepare for enabling CET support for nested guest
Documentation/virt/kvm/api.rst | 14 +-
arch/x86/include/asm/kvm_host.h | 7 +-
arch/x86/include/asm/vmx.h | 9 +
arch/x86/include/uapi/asm/kvm.h | 34 ++
arch/x86/kvm/cpuid.c | 35 +-
arch/x86/kvm/emulate.c | 149 +++++-
arch/x86/kvm/kvm_cache_regs.h | 3 +-
arch/x86/kvm/mmu.h | 2 +-
arch/x86/kvm/mmu/mmutrace.h | 3 +
arch/x86/kvm/smm.c | 8 +
arch/x86/kvm/smm.h | 2 +-
arch/x86/kvm/svm/nested.c | 20 +
arch/x86/kvm/svm/sev.c | 37 +-
arch/x86/kvm/svm/svm.c | 50 +-
arch/x86/kvm/svm/svm.h | 29 +-
arch/x86/kvm/trace.h | 5 +-
arch/x86/kvm/vmx/capabilities.h | 9 +
arch/x86/kvm/vmx/nested.c | 185 ++++++-
arch/x86/kvm/vmx/nested.h | 5 +
arch/x86/kvm/vmx/vmcs12.c | 6 +
arch/x86/kvm/vmx/vmcs12.h | 14 +-
arch/x86/kvm/vmx/vmx.c | 93 +++-
arch/x86/kvm/vmx/vmx.h | 9 +-
arch/x86/kvm/x86.c | 413 ++++++++++++++-
arch/x86/kvm/x86.h | 37 ++
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../selftests/kvm/include/x86/processor.h | 7 +
.../testing/selftests/kvm/lib/x86/processor.c | 33 ++
.../selftests/kvm/x86/hyperv_features.c | 16 +-
.../selftests/kvm/x86/monitor_mwait_test.c | 8 +-
tools/testing/selftests/kvm/x86/msrs_test.c | 485 ++++++++++++++++++
.../selftests/kvm/x86/pmu_counters_test.c | 4 +-
.../selftests/kvm/x86/vmx_pmu_caps_test.c | 4 +-
.../selftests/kvm/x86/xcr0_cpuid_test.c | 12 +-
34 files changed, 1624 insertions(+), 124 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86/msrs_test.c
base-commit: fa8ba002a503ab724311c4cf9db58d50a33c4b5c
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply [flat|nested] 114+ messages in thread
* [PATCH v16 01/51] KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code()
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 02/51] KVM: SEV: Read save fields from GHCB exactly once Sean Christopherson
` (51 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() to make
it clear that KVM is getting the cached value, not reading directly from
the guest-controlled GHCB. More importantly, vacating
kvm_ghcb_get_sw_exit_code() will allow adding a KVM-specific macro-built
kvm_ghcb_get_##field() helper to read values from the GHCB.
No functional change intended.
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/sev.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index cce48fff2e6c..f046a587ecaf 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3264,7 +3264,7 @@ void sev_free_vcpu(struct kvm_vcpu *vcpu)
kvfree(svm->sev_es.ghcb_sa);
}
-static u64 kvm_ghcb_get_sw_exit_code(struct vmcb_control_area *control)
+static u64 kvm_get_cached_sw_exit_code(struct vmcb_control_area *control)
{
return (((u64)control->exit_code_hi) << 32) | control->exit_code;
}
@@ -3290,7 +3290,7 @@ static void dump_ghcb(struct vcpu_svm *svm)
*/
pr_err("GHCB (GPA=%016llx) snapshot:\n", svm->vmcb->control.ghcb_gpa);
pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_code",
- kvm_ghcb_get_sw_exit_code(control), kvm_ghcb_sw_exit_code_is_valid(svm));
+ kvm_get_cached_sw_exit_code(control), kvm_ghcb_sw_exit_code_is_valid(svm));
pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_1",
control->exit_info_1, kvm_ghcb_sw_exit_info_1_is_valid(svm));
pr_err("%-20s%016llx is_valid: %u\n", "sw_exit_info_2",
@@ -3379,7 +3379,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
* Retrieve the exit code now even though it may not be marked valid
* as it could help with debugging.
*/
- exit_code = kvm_ghcb_get_sw_exit_code(control);
+ exit_code = kvm_get_cached_sw_exit_code(control);
/* Only GHCB Usage code 0 is supported */
if (svm->sev_es.ghcb->ghcb_usage) {
@@ -4384,7 +4384,7 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
svm_vmgexit_success(svm, 0);
- exit_code = kvm_ghcb_get_sw_exit_code(control);
+ exit_code = kvm_get_cached_sw_exit_code(control);
switch (exit_code) {
case SVM_VMGEXIT_MMIO_READ:
ret = setup_vmgexit_scratch(svm, true, control->exit_info_2);
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 02/51] KVM: SEV: Read save fields from GHCB exactly once
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 01/51] KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 21:39 ` Tom Lendacky
2025-09-19 22:32 ` [PATCH v16 03/51] KVM: SEV: Validate XCR0 provided by guest in GHCB Sean Christopherson
` (50 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Wrap all reads of GHCB save fields with READ_ONCE() via a KVM-specific
GHCB get() utility to help guard against TOCTOU bugs. Using READ_ONCE()
doesn't completely prevent such bugs, e.g. doesn't prevent KVM from
redoing get() after checking the initial value, but at least addresses
all potential TOCTOU issues in the current KVM code base.
To prevent unintentional use of the generic helpers, take only @svm for
the kvm_ghcb_get_xxx() helpers and retrieve the ghcb instead of explicitly
passing it in.
Opportunistically reduce the indentation of the macro-defined helpers and
clean up the alignment.
Fixes: 4e15a0ddc3ff ("KVM: SEV: snapshot the GHCB before accessing it")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/sev.c | 22 +++++++++++-----------
arch/x86/kvm/svm/svm.h | 25 +++++++++++++++----------
2 files changed, 26 insertions(+), 21 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f046a587ecaf..8d057dbd8a71 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3343,26 +3343,26 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
BUILD_BUG_ON(sizeof(svm->sev_es.valid_bitmap) != sizeof(ghcb->save.valid_bitmap));
memcpy(&svm->sev_es.valid_bitmap, &ghcb->save.valid_bitmap, sizeof(ghcb->save.valid_bitmap));
- vcpu->arch.regs[VCPU_REGS_RAX] = kvm_ghcb_get_rax_if_valid(svm, ghcb);
- vcpu->arch.regs[VCPU_REGS_RBX] = kvm_ghcb_get_rbx_if_valid(svm, ghcb);
- vcpu->arch.regs[VCPU_REGS_RCX] = kvm_ghcb_get_rcx_if_valid(svm, ghcb);
- vcpu->arch.regs[VCPU_REGS_RDX] = kvm_ghcb_get_rdx_if_valid(svm, ghcb);
- vcpu->arch.regs[VCPU_REGS_RSI] = kvm_ghcb_get_rsi_if_valid(svm, ghcb);
+ vcpu->arch.regs[VCPU_REGS_RAX] = kvm_ghcb_get_rax_if_valid(svm);
+ vcpu->arch.regs[VCPU_REGS_RBX] = kvm_ghcb_get_rbx_if_valid(svm);
+ vcpu->arch.regs[VCPU_REGS_RCX] = kvm_ghcb_get_rcx_if_valid(svm);
+ vcpu->arch.regs[VCPU_REGS_RDX] = kvm_ghcb_get_rdx_if_valid(svm);
+ vcpu->arch.regs[VCPU_REGS_RSI] = kvm_ghcb_get_rsi_if_valid(svm);
- svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb);
+ svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm);
if (kvm_ghcb_xcr0_is_valid(svm)) {
- vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
+ vcpu->arch.xcr0 = kvm_ghcb_get_xcr0(svm);
vcpu->arch.cpuid_dynamic_bits_dirty = true;
}
/* Copy the GHCB exit information into the VMCB fields */
- exit_code = ghcb_get_sw_exit_code(ghcb);
+ exit_code = kvm_ghcb_get_sw_exit_code(svm);
control->exit_code = lower_32_bits(exit_code);
control->exit_code_hi = upper_32_bits(exit_code);
- control->exit_info_1 = ghcb_get_sw_exit_info_1(ghcb);
- control->exit_info_2 = ghcb_get_sw_exit_info_2(ghcb);
- svm->sev_es.sw_scratch = kvm_ghcb_get_sw_scratch_if_valid(svm, ghcb);
+ control->exit_info_1 = kvm_ghcb_get_sw_exit_info_1(svm);
+ control->exit_info_2 = kvm_ghcb_get_sw_exit_info_2(svm);
+ svm->sev_es.sw_scratch = kvm_ghcb_get_sw_scratch_if_valid(svm);
/* Clear the valid entries fields */
memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5d39c0b17988..5365984e82e5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -913,16 +913,21 @@ void __svm_sev_es_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted,
void __svm_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted);
#define DEFINE_KVM_GHCB_ACCESSORS(field) \
- static __always_inline bool kvm_ghcb_##field##_is_valid(const struct vcpu_svm *svm) \
- { \
- return test_bit(GHCB_BITMAP_IDX(field), \
- (unsigned long *)&svm->sev_es.valid_bitmap); \
- } \
- \
- static __always_inline u64 kvm_ghcb_get_##field##_if_valid(struct vcpu_svm *svm, struct ghcb *ghcb) \
- { \
- return kvm_ghcb_##field##_is_valid(svm) ? ghcb->save.field : 0; \
- } \
+static __always_inline u64 kvm_ghcb_get_##field(struct vcpu_svm *svm) \
+{ \
+ return READ_ONCE(svm->sev_es.ghcb->save.field); \
+} \
+ \
+static __always_inline bool kvm_ghcb_##field##_is_valid(const struct vcpu_svm *svm) \
+{ \
+ return test_bit(GHCB_BITMAP_IDX(field), \
+ (unsigned long *)&svm->sev_es.valid_bitmap); \
+} \
+ \
+static __always_inline u64 kvm_ghcb_get_##field##_if_valid(struct vcpu_svm *svm) \
+{ \
+ return kvm_ghcb_##field##_is_valid(svm) ? kvm_ghcb_get_##field(svm) : 0; \
+}
DEFINE_KVM_GHCB_ACCESSORS(cpl)
DEFINE_KVM_GHCB_ACCESSORS(rax)
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 03/51] KVM: SEV: Validate XCR0 provided by guest in GHCB
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 01/51] KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 02/51] KVM: SEV: Read save fields from GHCB exactly once Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 04/51] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Sean Christopherson
` (49 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Use __kvm_set_xcr() to propagate XCR0 changes from the GHCB to KVM's
software model in order to validate the new XCR0 against KVM's view of
the supported XCR0. Allowing garbage is thankfully mostly benign, as
kvm_load_{guest,host}_xsave_state() bail early for vCPUs with protected
state, xstate_required_size() will simply provide garbage back to the
guest, and attempting to save/restore the bad value via KVM_{G,S}ET_XCRS
will only harm the guest (setting XCR0 will fail).
However, allowing the guest to put junk into a field that KVM assumes is
valid is a CVE waiting to happen. And as a bonus, using the proper API
eliminates the ugly open coding of setting arch.cpuid_dynamic_bits_dirty.
Simply ignore bad values, as either the guest managed to get an
unsupported value into hardware, or the guest is misbehaving and providing
pure garbage. In either case, KVM can't fix the broken guest.
Note, using __kvm_set_xcr() also avoids recomputing dynamic CPUID bits
if XCR0 isn't actually changing (relatively to KVM's previous snapshot).
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm/sev.c | 6 ++----
arch/x86/kvm/x86.c | 3 ++-
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 17772513b9cc..8695967b7a31 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2209,6 +2209,7 @@ int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val);
unsigned long kvm_get_dr(struct kvm_vcpu *vcpu, int dr);
unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
+int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu);
int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 8d057dbd8a71..85e84bb1a368 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3351,10 +3351,8 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm);
- if (kvm_ghcb_xcr0_is_valid(svm)) {
- vcpu->arch.xcr0 = kvm_ghcb_get_xcr0(svm);
- vcpu->arch.cpuid_dynamic_bits_dirty = true;
- }
+ if (kvm_ghcb_xcr0_is_valid(svm))
+ __kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(svm));
/* Copy the GHCB exit information into the VMCB fields */
exit_code = kvm_ghcb_get_sw_exit_code(svm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e07936efacd4..55044d6680c8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1235,7 +1235,7 @@ static inline u64 kvm_guest_supported_xfd(struct kvm_vcpu *vcpu)
}
#endif
-static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
+int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
{
u64 xcr0 = xcr;
u64 old_xcr0 = vcpu->arch.xcr0;
@@ -1279,6 +1279,7 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
vcpu->arch.cpuid_dynamic_bits_dirty = true;
return 0;
}
+EXPORT_SYMBOL_GPL(__kvm_set_xcr);
int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
{
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 04/51] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (2 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 03/51] KVM: SEV: Validate XCR0 provided by guest in GHCB Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 05/51] KVM: x86: Report XSS as to-be-saved if there are supported features Sean Christopherson
` (48 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Enable KVM_{G,S}ET_ONE_REG uAPIs so that userspace can access MSRs and
other non-MSR registers through them, along with support for
KVM_GET_REG_LIST to enumerate support for KVM-defined registers.
This is in preparation for allowing userspace to read/write the guest SSP
register, which is needed for the upcoming CET virtualization support.
Currently, two types of registers are supported: KVM_X86_REG_TYPE_MSR and
KVM_X86_REG_TYPE_KVM. All MSRs are in the former type; the latter type is
added for registers that lack existing KVM uAPIs to access them. The "KVM"
in the name is intended to be vague to give KVM flexibility to include
other potential registers. More precise names like "SYNTHETIC" and
"SYNTHETIC_MSR" were considered, but were deemed too confusing (e.g. can
be conflated with synthetic guest-visible MSRs) and may put KVM into a
corner (e.g. if KVM wants to change how a KVM-defined register is modeled
internally).
Enumerate only KVM-defined registers in KVM_GET_REG_LIST to avoid
duplicating KVM_GET_MSR_INDEX_LIST, and so that KVM can return _only_
registers that are fully supported (KVM_GET_REG_LIST is vCPU-scoped, i.e.
can be precise, whereas KVM_GET_MSR_INDEX_LIST is system-scoped).
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Link: https://lore.kernel.org/all/20240219074733.122080-18-weijiang.yang@intel.com [1]
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
Documentation/virt/kvm/api.rst | 6 +-
arch/x86/include/uapi/asm/kvm.h | 26 +++++++++
arch/x86/kvm/x86.c | 100 ++++++++++++++++++++++++++++++++
3 files changed, 131 insertions(+), 1 deletion(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index ffc350b649ad..abd02675a24d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2908,6 +2908,8 @@ such as set vcpu counter or reset vcpu, and they have the following id bit patte
0x9030 0000 0002 <reg:16>
+x86 MSR registers have the following id bit patterns::
+ 0x2030 0002 <msr number:32>
4.69 KVM_GET_ONE_REG
--------------------
@@ -3588,7 +3590,7 @@ VCPU matching underlying host.
---------------------
:Capability: basic
-:Architectures: arm64, mips, riscv
+:Architectures: arm64, mips, riscv, x86 (if KVM_CAP_ONE_REG)
:Type: vcpu ioctl
:Parameters: struct kvm_reg_list (in/out)
:Returns: 0 on success; -1 on error
@@ -3631,6 +3633,8 @@ Note that s390 does not support KVM_GET_REG_LIST for historical reasons
- KVM_REG_S390_GBEA
+Note, for x86, all MSRs enumerated by KVM_GET_MSR_INDEX_LIST are supported as
+type KVM_X86_REG_TYPE_MSR, but are NOT enumerated via KVM_GET_REG_LIST.
4.85 KVM_ARM_SET_DEVICE_ADDR (deprecated)
-----------------------------------------
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 0f15d683817d..aae1033c8afa 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -411,6 +411,32 @@ struct kvm_xcrs {
__u64 padding[16];
};
+#define KVM_X86_REG_TYPE_MSR 2
+#define KVM_X86_REG_TYPE_KVM 3
+
+#define KVM_X86_KVM_REG_SIZE(reg) \
+({ \
+ reg == KVM_REG_GUEST_SSP ? KVM_REG_SIZE_U64 : 0; \
+})
+
+#define KVM_X86_REG_TYPE_SIZE(type, reg) \
+({ \
+ __u64 type_size = (__u64)type << 32; \
+ \
+ type_size |= type == KVM_X86_REG_TYPE_MSR ? KVM_REG_SIZE_U64 : \
+ type == KVM_X86_REG_TYPE_KVM ? KVM_X86_KVM_REG_SIZE(reg) : \
+ 0; \
+ type_size; \
+})
+
+#define KVM_X86_REG_ID(type, index) \
+ (KVM_REG_X86 | KVM_X86_REG_TYPE_SIZE(type, index) | index)
+
+#define KVM_X86_REG_MSR(index) \
+ KVM_X86_REG_ID(KVM_X86_REG_TYPE_MSR, index)
+#define KVM_X86_REG_KVM(index) \
+ KVM_X86_REG_ID(KVM_X86_REG_TYPE_KVM, index)
+
#define KVM_SYNC_X86_REGS (1UL << 0)
#define KVM_SYNC_X86_SREGS (1UL << 1)
#define KVM_SYNC_X86_EVENTS (1UL << 2)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 55044d6680c8..4ed25d33aaee 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4735,6 +4735,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_IRQFD_RESAMPLE:
case KVM_CAP_MEMORY_FAULT_INFO:
case KVM_CAP_X86_GUEST_MODE:
+ case KVM_CAP_ONE_REG:
r = 1;
break;
case KVM_CAP_PRE_FAULT_MEMORY:
@@ -5913,6 +5914,98 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
}
}
+struct kvm_x86_reg_id {
+ __u32 index;
+ __u8 type;
+ __u8 rsvd1;
+ __u8 rsvd2:4;
+ __u8 size:4;
+ __u8 x86;
+};
+
+static int kvm_translate_kvm_reg(struct kvm_x86_reg_id *reg)
+{
+ return -EINVAL;
+}
+
+static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
+{
+ u64 val;
+
+ if (do_get_msr(vcpu, msr, &val))
+ return -EINVAL;
+
+ if (put_user(val, user_val))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int kvm_set_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
+{
+ u64 val;
+
+ if (get_user(val, user_val))
+ return -EFAULT;
+
+ if (do_set_msr(vcpu, msr, &val))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
+ void __user *argp)
+{
+ struct kvm_one_reg one_reg;
+ struct kvm_x86_reg_id *reg;
+ u64 __user *user_val;
+ int r;
+
+ if (copy_from_user(&one_reg, argp, sizeof(one_reg)))
+ return -EFAULT;
+
+ if ((one_reg.id & KVM_REG_ARCH_MASK) != KVM_REG_X86)
+ return -EINVAL;
+
+ reg = (struct kvm_x86_reg_id *)&one_reg.id;
+ if (reg->rsvd1 || reg->rsvd2)
+ return -EINVAL;
+
+ if (reg->type == KVM_X86_REG_TYPE_KVM) {
+ r = kvm_translate_kvm_reg(reg);
+ if (r)
+ return r;
+ }
+
+ if (reg->type != KVM_X86_REG_TYPE_MSR)
+ return -EINVAL;
+
+ if ((one_reg.id & KVM_REG_SIZE_MASK) != KVM_REG_SIZE_U64)
+ return -EINVAL;
+
+ guard(srcu)(&vcpu->kvm->srcu);
+
+ user_val = u64_to_user_ptr(one_reg.addr);
+ if (ioctl == KVM_GET_ONE_REG)
+ r = kvm_get_one_msr(vcpu, reg->index, user_val);
+ else
+ r = kvm_set_one_msr(vcpu, reg->index, user_val);
+
+ return r;
+}
+
+static int kvm_get_reg_list(struct kvm_vcpu *vcpu,
+ struct kvm_reg_list __user *user_list)
+{
+ u64 nr_regs = 0;
+
+ if (put_user(nr_regs, &user_list->n))
+ return -EFAULT;
+
+ return 0;
+}
+
long kvm_arch_vcpu_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -6029,6 +6122,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
srcu_read_unlock(&vcpu->kvm->srcu, idx);
break;
}
+ case KVM_GET_ONE_REG:
+ case KVM_SET_ONE_REG:
+ r = kvm_get_set_one_reg(vcpu, ioctl, argp);
+ break;
+ case KVM_GET_REG_LIST:
+ r = kvm_get_reg_list(vcpu, argp);
+ break;
case KVM_TPR_ACCESS_REPORTING: {
struct kvm_tpr_access_ctl tac;
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 05/51] KVM: x86: Report XSS as to-be-saved if there are supported features
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (3 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 04/51] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 06/51] KVM: x86: Check XSS validity against guest CPUIDs Sean Christopherson
` (47 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Add MSR_IA32_XSS to list of MSRs reported to userspace if supported_xss
is non-zero, i.e. KVM supports at least one XSS based feature.
Before enabling CET virtualization series, guest IA32_MSR_XSS is
guaranteed to be 0, i.e., XSAVES/XRSTORS is executed in non-root mode
with XSS == 0, which equals to the effect of XSAVE/XRSTOR.
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4ed25d33aaee..d202d9532eb2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -332,7 +332,7 @@ static const u32 msrs_to_save_base[] = {
MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
MSR_IA32_UMWAIT_CONTROL,
- MSR_IA32_XFD, MSR_IA32_XFD_ERR,
+ MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
};
static const u32 msrs_to_save_pmu[] = {
@@ -7503,6 +7503,10 @@ static void kvm_probe_msr_to_save(u32 msr_index)
if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
return;
break;
+ case MSR_IA32_XSS:
+ if (!kvm_caps.supported_xss)
+ return;
+ break;
default:
break;
}
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 06/51] KVM: x86: Check XSS validity against guest CPUIDs
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (4 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 05/51] KVM: x86: Report XSS as to-be-saved if there are supported features Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 07/51] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Sean Christopherson
` (46 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Chao Gao <chao.gao@intel.com>
Maintain per-guest valid XSS bits and check XSS validity against them
rather than against KVM capabilities. This is to prevent bits that are
supported by KVM but not supported for a guest from being set.
Opportunistically return KVM_MSR_RET_UNSUPPORTED on IA32_XSS MSR accesses
if guest CPUID doesn't enumerate X86_FEATURE_XSAVES. Since
KVM_MSR_RET_UNSUPPORTED takes care of host_initiated cases, drop the
host_initiated check.
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 3 ++-
arch/x86/kvm/cpuid.c | 12 ++++++++++++
arch/x86/kvm/x86.c | 7 +++----
3 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8695967b7a31..7a7e6356a8dd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -815,7 +815,6 @@ struct kvm_vcpu_arch {
bool at_instruction_boundary;
bool tpr_access_reporting;
bool xfd_no_write_intercept;
- u64 ia32_xss;
u64 microcode_version;
u64 arch_capabilities;
u64 perf_capabilities;
@@ -876,6 +875,8 @@ struct kvm_vcpu_arch {
u64 xcr0;
u64 guest_supported_xcr0;
+ u64 ia32_xss;
+ u64 guest_supported_xss;
struct kvm_pio_request pio;
void *pio_data;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index efee08fad72e..6b8b5d8b13cc 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -263,6 +263,17 @@ static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu)
return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
}
+static u64 cpuid_get_supported_xss(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpuid_entry2 *best;
+
+ best = kvm_find_cpuid_entry_index(vcpu, 0xd, 1);
+ if (!best)
+ return 0;
+
+ return (best->ecx | ((u64)best->edx << 32)) & kvm_caps.supported_xss;
+}
+
static __always_inline void kvm_update_feature_runtime(struct kvm_vcpu *vcpu,
struct kvm_cpuid_entry2 *entry,
unsigned int x86_feature,
@@ -424,6 +435,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
}
vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu);
+ vcpu->arch.guest_supported_xss = cpuid_get_supported_xss(vcpu);
vcpu->arch.pv_cpuid.features = kvm_apply_cpuid_pv_features_quirk(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d202d9532eb2..d4c192f4c06f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3984,15 +3984,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
}
break;
case MSR_IA32_XSS:
- if (!msr_info->host_initiated &&
- !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
- return 1;
+ if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
+ return KVM_MSR_RET_UNSUPPORTED;
/*
* KVM supports exposing PT to the guest, but does not support
* IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
* XSAVES/XRSTORS to save/restore PT MSRs.
*/
- if (data & ~kvm_caps.supported_xss)
+ if (data & ~vcpu->arch.guest_supported_xss)
return 1;
vcpu->arch.ia32_xss = data;
vcpu->arch.cpuid_dynamic_bits_dirty = true;
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 07/51] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (5 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 06/51] KVM: x86: Check XSS validity against guest CPUIDs Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 08/51] KVM: x86: Initialize kvm_caps.supported_xss Sean Christopherson
` (45 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Update CPUID.(EAX=0DH,ECX=1).EBX to reflect current required xstate size
due to XSS MSR modification.
CPUID(EAX=0DH,ECX=1).EBX reports the required storage size of all enabled
xstate features in (XCR0 | IA32_XSS). The CPUID value can be used by guest
before allocate sufficient xsave buffer.
Note, KVM does not yet support any XSS based features, i.e. supported_xss
is guaranteed to be zero at this time.
Opportunistically skip CPUID updates if XSS value doesn't change.
Suggested-by: Sean Christopherson <seanjc@google.com>
Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/cpuid.c | 3 ++-
arch/x86/kvm/x86.c | 2 ++
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 6b8b5d8b13cc..32fde9e80c28 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -316,7 +316,8 @@ static void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
best = kvm_find_cpuid_entry_index(vcpu, 0xD, 1);
if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
- best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
+ best->ebx = xstate_required_size(vcpu->arch.xcr0 |
+ vcpu->arch.ia32_xss, true);
}
static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d4c192f4c06f..c87ed216f72a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3993,6 +3993,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
*/
if (data & ~vcpu->arch.guest_supported_xss)
return 1;
+ if (vcpu->arch.ia32_xss == data)
+ break;
vcpu->arch.ia32_xss = data;
vcpu->arch.cpuid_dynamic_bits_dirty = true;
break;
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 08/51] KVM: x86: Initialize kvm_caps.supported_xss
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (6 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 07/51] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 09/51] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Sean Christopherson
` (44 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Set original kvm_caps.supported_xss to (host_xss & KVM_SUPPORTED_XSS) if
XSAVES is supported. host_xss contains the host supported xstate feature
bits for thread FPU context switch, KVM_SUPPORTED_XSS includes all KVM
enabled XSS feature bits, the resulting value represents the supervisor
xstates that are available to guest and are backed by host FPU framework
for swapping {guest,host} XSAVE-managed registers/MSRs.
[sean: relocate and enhance comment about PT / XSS[8] ]
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c87ed216f72a..3e66d8c5000a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -217,6 +217,14 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
+/*
+ * Note, KVM supports exposing PT to the guest, but does not support context
+ * switching PT via XSTATE (KVM's PT virtualization relies on perf; swapping
+ * PT via guest XSTATE would clobber perf state), i.e. KVM doesn't support
+ * IA32_XSS[bit 8] (guests can/must use RDMSR/WRMSR to save/restore PT MSRs).
+ */
+#define KVM_SUPPORTED_XSS 0
+
bool __read_mostly allow_smaller_maxphyaddr = 0;
EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);
@@ -3986,11 +3994,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_XSS:
if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
return KVM_MSR_RET_UNSUPPORTED;
- /*
- * KVM supports exposing PT to the guest, but does not support
- * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
- * XSAVES/XRSTORS to save/restore PT MSRs.
- */
+
if (data & ~vcpu->arch.guest_supported_xss)
return 1;
if (vcpu->arch.ia32_xss == data)
@@ -9822,14 +9826,17 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
}
+
+ if (boot_cpu_has(X86_FEATURE_XSAVES)) {
+ rdmsrq(MSR_IA32_XSS, kvm_host.xss);
+ kvm_caps.supported_xss = kvm_host.xss & KVM_SUPPORTED_XSS;
+ }
+
kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS;
kvm_caps.inapplicable_quirks = KVM_X86_CONDITIONAL_QUIRKS;
rdmsrq_safe(MSR_EFER, &kvm_host.efer);
- if (boot_cpu_has(X86_FEATURE_XSAVES))
- rdmsrq(MSR_IA32_XSS, kvm_host.xss);
-
kvm_init_pmu_capability(ops->pmu_ops);
if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 09/51] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (7 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 08/51] KVM: x86: Initialize kvm_caps.supported_xss Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 2:10 ` Binbin Wu
2025-09-19 22:32 ` [PATCH v16 10/51] KVM: x86: Add fault checks for guest CR4.CET setting Sean Christopherson
` (43 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Load the guest's FPU state if userspace is accessing MSRs whose values
are managed by XSAVES. Introduce two helpers, kvm_{get,set}_xstate_msr(),
to facilitate access to such kind of MSRs.
If MSRs supported in kvm_caps.supported_xss are passed through to guest,
the guest MSRs are swapped with host's before vCPU exits to userspace and
after it reenters kernel before next VM-entry.
Because the modified code is also used for the KVM_GET_MSRS device ioctl(),
explicitly check @vcpu is non-null before attempting to load guest state.
The XSAVE-managed MSRs cannot be retrieved via the device ioctl() without
loading guest FPU state (which doesn't exist).
Note that guest_cpuid_has() is not queried as host userspace is allowed to
access MSRs that have not been exposed to the guest, e.g. it might do
KVM_SET_MSRS prior to KVM_SET_CPUID2.
The two helpers are put here in order to manifest accessing xsave-managed
MSRs requires special check and handling to guarantee the correctness of
read/write to the MSRs.
Co-developed-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: drop S_CET, add big comment, move accessors to x86.c]
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 87 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 86 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3e66d8c5000a..ae402463f991 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -136,6 +136,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
static DEFINE_MUTEX(vendor_module_lock);
+static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
+static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
+
struct kvm_x86_ops kvm_x86_ops __read_mostly;
#define KVM_X86_OP(func) \
@@ -3801,6 +3804,67 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
}
+/*
+ * Returns true if the MSR in question is managed via XSTATE, i.e. is context
+ * switched with the rest of guest FPU state. Note! S_CET is _not_ context
+ * switched via XSTATE even though it _is_ saved/restored via XSAVES/XRSTORS.
+ * Because S_CET is loaded on VM-Enter and VM-Exit via dedicated VMCS fields,
+ * the value saved/restored via XSTATE is always the host's value. That detail
+ * is _extremely_ important, as the guest's S_CET must _never_ be resident in
+ * hardware while executing in the host. Loading guest values for U_CET and
+ * PL[0-3]_SSP while executing in the kernel is safe, as U_CET is specific to
+ * userspace, and PL[0-3]_SSP are only consumed when transitioning to lower
+ * privilegel levels, i.e. are effectively only consumed by userspace as well.
+ */
+static bool is_xstate_managed_msr(struct kvm_vcpu *vcpu, u32 msr)
+{
+ if (!vcpu)
+ return false;
+
+ switch (msr) {
+ case MSR_IA32_U_CET:
+ return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ||
+ guest_cpu_cap_has(vcpu, X86_FEATURE_IBT);
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ return guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+ default:
+ return false;
+ }
+}
+
+/*
+ * Lock (and if necessary, re-load) the guest FPU, i.e. XSTATE, and access an
+ * MSR that is managed via XSTATE. Note, the caller is responsible for doing
+ * the initial FPU load, this helper only ensures that guest state is resident
+ * in hardware (the kernel can load its FPU state in IRQ context).
+ */
+static __always_inline void kvm_access_xstate_msr(struct kvm_vcpu *vcpu,
+ struct msr_data *msr_info,
+ int access)
+{
+ BUILD_BUG_ON(access != MSR_TYPE_R && access != MSR_TYPE_W);
+
+ KVM_BUG_ON(!is_xstate_managed_msr(vcpu, msr_info->index), vcpu->kvm);
+ KVM_BUG_ON(!vcpu->arch.guest_fpu.fpstate->in_use, vcpu->kvm);
+
+ kvm_fpu_get();
+ if (access == MSR_TYPE_R)
+ rdmsrq(msr_info->index, msr_info->data);
+ else
+ wrmsrq(msr_info->index, msr_info->data);
+ kvm_fpu_put();
+}
+
+static __maybe_unused void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_W);
+}
+
+static __maybe_unused void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+ kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_R);
+}
+
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
u32 msr = msr_info->index;
@@ -4551,11 +4615,25 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs,
int (*do_msr)(struct kvm_vcpu *vcpu,
unsigned index, u64 *data))
{
+ bool fpu_loaded = false;
int i;
- for (i = 0; i < msrs->nmsrs; ++i)
+ for (i = 0; i < msrs->nmsrs; ++i) {
+ /*
+ * If userspace is accessing one or more XSTATE-managed MSRs,
+ * temporarily load the guest's FPU state so that the guest's
+ * MSR value(s) is resident in hardware and thus can be accessed
+ * via RDMSR/WRMSR.
+ */
+ if (!fpu_loaded && is_xstate_managed_msr(vcpu, entries[i].index)) {
+ kvm_load_guest_fpu(vcpu);
+ fpu_loaded = true;
+ }
if (do_msr(vcpu, entries[i].index, &entries[i].data))
break;
+ }
+ if (fpu_loaded)
+ kvm_put_guest_fpu(vcpu);
return i;
}
@@ -5965,6 +6043,7 @@ static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
struct kvm_one_reg one_reg;
struct kvm_x86_reg_id *reg;
u64 __user *user_val;
+ bool load_fpu;
int r;
if (copy_from_user(&one_reg, argp, sizeof(one_reg)))
@@ -5991,12 +6070,18 @@ static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
guard(srcu)(&vcpu->kvm->srcu);
+ load_fpu = is_xstate_managed_msr(vcpu, reg->index);
+ if (load_fpu)
+ kvm_load_guest_fpu(vcpu);
+
user_val = u64_to_user_ptr(one_reg.addr);
if (ioctl == KVM_GET_ONE_REG)
r = kvm_get_one_msr(vcpu, reg->index, user_val);
else
r = kvm_set_one_msr(vcpu, reg->index, user_val);
+ if (load_fpu)
+ kvm_put_guest_fpu(vcpu);
return r;
}
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 10/51] KVM: x86: Add fault checks for guest CR4.CET setting
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (8 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 09/51] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 11/51] KVM: x86: Report KVM supported CET MSRs as to-be-saved Sean Christopherson
` (42 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Check potential faults for CR4.CET setting per Intel SDM requirements.
CET can be enabled if and only if CR0.WP == 1, i.e. setting CR4.CET ==
1 faults if CR0.WP == 0 and setting CR0.WP == 0 fails if CR4.CET == 1.
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ae402463f991..d748b1ce1e81 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1176,6 +1176,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
(is_64_bit_mode(vcpu) || kvm_is_cr4_bit_set(vcpu, X86_CR4_PCIDE)))
return 1;
+ if (!(cr0 & X86_CR0_WP) && kvm_is_cr4_bit_set(vcpu, X86_CR4_CET))
+ return 1;
+
kvm_x86_call(set_cr0)(vcpu, cr0);
kvm_post_set_cr0(vcpu, old_cr0, cr0);
@@ -1376,6 +1379,9 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
return 1;
}
+ if ((cr4 & X86_CR4_CET) && !kvm_is_cr0_bit_set(vcpu, X86_CR0_WP))
+ return 1;
+
kvm_x86_call(set_cr4)(vcpu, cr4);
kvm_post_set_cr4(vcpu, old_cr4, cr4);
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 11/51] KVM: x86: Report KVM supported CET MSRs as to-be-saved
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (9 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 10/51] KVM: x86: Add fault checks for guest CR4.CET setting Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 12/51] KVM: VMX: Introduce CET VMCS fields and control bits Sean Christopherson
` (41 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Add CET MSRs to the list of MSRs reported to userspace if the feature,
i.e. IBT or SHSTK, associated with the MSRs is supported by KVM.
Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d748b1ce1e81..5245b21168cb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -344,6 +344,10 @@ static const u32 msrs_to_save_base[] = {
MSR_IA32_UMWAIT_CONTROL,
MSR_IA32_XFD, MSR_IA32_XFD_ERR, MSR_IA32_XSS,
+
+ MSR_IA32_U_CET, MSR_IA32_S_CET,
+ MSR_IA32_PL0_SSP, MSR_IA32_PL1_SSP, MSR_IA32_PL2_SSP,
+ MSR_IA32_PL3_SSP, MSR_IA32_INT_SSP_TAB,
};
static const u32 msrs_to_save_pmu[] = {
@@ -7603,6 +7607,20 @@ static void kvm_probe_msr_to_save(u32 msr_index)
if (!kvm_caps.supported_xss)
return;
break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+ !kvm_cpu_cap_has(X86_FEATURE_IBT))
+ return;
+ break;
+ case MSR_IA32_INT_SSP_TAB:
+ if (!kvm_cpu_cap_has(X86_FEATURE_LM))
+ return;
+ fallthrough;
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+ return;
+ break;
default:
break;
}
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 12/51] KVM: VMX: Introduce CET VMCS fields and control bits
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (10 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 11/51] KVM: x86: Report KVM supported CET MSRs as to-be-saved Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 13/51] KVM: x86: Enable guest SSP read/write interface with new uAPIs Sean Christopherson
` (40 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Control-flow Enforcement Technology (CET) is a kind of CPU feature used
to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
style control-flow subversion attacks.
Shadow Stack (SHSTK):
A shadow stack is a second stack used exclusively for control transfer
operations. The shadow stack is separate from the data/normal stack and
can be enabled individually in user and kernel mode. When shadow stack
is enabled, CALL pushes the return address on both the data and shadow
stack. RET pops the return address from both stacks and compares them.
If the return addresses from the two stacks do not match, the processor
generates a #CP.
Indirect Branch Tracking (IBT):
IBT introduces instruction(ENDBRANCH)to mark valid target addresses of
indirect branches (CALL, JMP etc...). If an indirect branch is executed
and the next instruction is _not_ an ENDBRANCH, the processor generates
a #CP. These instruction behaves as a NOP on platforms that have no CET.
Several new CET MSRs are defined to support CET:
MSR_IA32_{U,S}_CET: CET settings for {user,supervisor} CET respectively.
MSR_IA32_PL{0,1,2,3}_SSP: SHSTK pointer linear address for CPL{0,1,2,3}.
MSR_IA32_INT_SSP_TAB: Linear address of SHSTK pointer table, whose entry
is indexed by IST of interrupt gate desc.
Two XSAVES state bits are introduced for CET:
IA32_XSS:[bit 11]: Control saving/restoring user mode CET states
IA32_XSS:[bit 12]: Control saving/restoring supervisor mode CET states.
Six VMCS fields are introduced for CET:
{HOST,GUEST}_S_CET: Stores CET settings for kernel mode.
{HOST,GUEST}_SSP: Stores current active SSP.
{HOST,GUEST}_INTR_SSP_TABLE: Stores current active MSR_IA32_INT_SSP_TAB.
On Intel platforms, two additional bits are defined in VM_EXIT and VM_ENTRY
control fields:
If VM_EXIT_LOAD_CET_STATE = 1, host CET states are loaded from following
VMCS fields at VM-Exit:
HOST_S_CET
HOST_SSP
HOST_INTR_SSP_TABLE
If VM_ENTRY_LOAD_CET_STATE = 1, guest CET states are loaded from following
VMCS fields at VM-Entry:
GUEST_S_CET
GUEST_SSP
GUEST_INTR_SSP_TABLE
Co-developed-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Zhang Yi Z <yi.z.zhang@linux.intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/vmx.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index cca7d6641287..ce10a7e2d3d9 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -106,6 +106,7 @@
#define VM_EXIT_CLEAR_BNDCFGS 0x00800000
#define VM_EXIT_PT_CONCEAL_PIP 0x01000000
#define VM_EXIT_CLEAR_IA32_RTIT_CTL 0x02000000
+#define VM_EXIT_LOAD_CET_STATE 0x10000000
#define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff
@@ -119,6 +120,7 @@
#define VM_ENTRY_LOAD_BNDCFGS 0x00010000
#define VM_ENTRY_PT_CONCEAL_PIP 0x00020000
#define VM_ENTRY_LOAD_IA32_RTIT_CTL 0x00040000
+#define VM_ENTRY_LOAD_CET_STATE 0x00100000
#define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x000011ff
@@ -369,6 +371,9 @@ enum vmcs_field {
GUEST_PENDING_DBG_EXCEPTIONS = 0x00006822,
GUEST_SYSENTER_ESP = 0x00006824,
GUEST_SYSENTER_EIP = 0x00006826,
+ GUEST_S_CET = 0x00006828,
+ GUEST_SSP = 0x0000682a,
+ GUEST_INTR_SSP_TABLE = 0x0000682c,
HOST_CR0 = 0x00006c00,
HOST_CR3 = 0x00006c02,
HOST_CR4 = 0x00006c04,
@@ -381,6 +386,9 @@ enum vmcs_field {
HOST_IA32_SYSENTER_EIP = 0x00006c12,
HOST_RSP = 0x00006c14,
HOST_RIP = 0x00006c16,
+ HOST_S_CET = 0x00006c18,
+ HOST_SSP = 0x00006c1a,
+ HOST_INTR_SSP_TABLE = 0x00006c1c
};
/*
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 13/51] KVM: x86: Enable guest SSP read/write interface with new uAPIs
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (11 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 12/51] KVM: VMX: Introduce CET VMCS fields and control bits Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 2:58 ` Binbin Wu
2025-09-23 9:06 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 14/51] KVM: VMX: Emulate read and write to CET MSRs Sean Christopherson
` (39 subsequent siblings)
52 siblings, 2 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Add a KVM-defined ONE_REG register, KVM_REG_GUEST_SSP, to let userspace
save and restore the guest's Shadow Stack Pointer (SSP). On both Intel
and AMD, SSP is a hardware register that can only be accessed by software
via dedicated ISA (e.g. RDSSP) or via VMCS/VMCB fields (used by hardware
to context switch SSP at entry/exit). As a result, SSP doesn't fit in
any of KVM's existing interfaces for saving/restoring state.
Internally, treat SSP as a fake/synthetic MSR, as the semantics of writes
to SSP follow that of several other Shadow Stack MSRs, e.g. the PLx_SSP
MSRs. Use a translation layer to hide the KVM-internal MSR index so that
the arbitrary index doesn't become ABI, e.g. so that KVM can rework its
implementation as needed, so long as the ONE_REG ABI is maintained.
Explicitly reject accesses to SSP if the vCPU doesn't have Shadow Stack
support to avoid running afoul of ignore_msrs, which unfortunately applies
to host-initiated accesses (which is a discussion for another day). I.e.
ensure consistent behavior for KVM-defined registers irrespective of
ignore_msrs.
Link: https://lore.kernel.org/all/aca9d389-f11e-4811-90cf-d98e345a5cc2@intel.com
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
Documentation/virt/kvm/api.rst | 8 +++++++
arch/x86/include/uapi/asm/kvm.h | 3 +++
arch/x86/kvm/x86.c | 37 +++++++++++++++++++++++++++++----
arch/x86/kvm/x86.h | 10 +++++++++
4 files changed, 54 insertions(+), 4 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index abd02675a24d..6ae24c5ca559 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2911,6 +2911,14 @@ such as set vcpu counter or reset vcpu, and they have the following id bit patte
x86 MSR registers have the following id bit patterns::
0x2030 0002 <msr number:32>
+Following are the KVM-defined registers for x86:
+
+======================= ========= =============================================
+ Encoding Register Description
+======================= ========= =============================================
+ 0x2030 0003 0000 0000 SSP Shadow Stack Pointer
+======================= ========= =============================================
+
4.69 KVM_GET_ONE_REG
--------------------
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index aae1033c8afa..467116186e71 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -437,6 +437,9 @@ struct kvm_xcrs {
#define KVM_X86_REG_KVM(index) \
KVM_X86_REG_ID(KVM_X86_REG_TYPE_KVM, index)
+/* KVM-defined registers starting from 0 */
+#define KVM_REG_GUEST_SSP 0
+
#define KVM_SYNC_X86_REGS (1UL << 0)
#define KVM_SYNC_X86_SREGS (1UL << 1)
#define KVM_SYNC_X86_EVENTS (1UL << 2)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5245b21168cb..720540f102e1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6016,9 +6016,27 @@ struct kvm_x86_reg_id {
__u8 x86;
};
-static int kvm_translate_kvm_reg(struct kvm_x86_reg_id *reg)
+static int kvm_translate_kvm_reg(struct kvm_vcpu *vcpu,
+ struct kvm_x86_reg_id *reg)
{
- return -EINVAL;
+ switch (reg->index) {
+ case KVM_REG_GUEST_SSP:
+ /*
+ * FIXME: If host-initiated accesses are ever exempted from
+ * ignore_msrs (in kvm_do_msr_access()), drop this manual check
+ * and rely on KVM's standard checks to reject accesses to regs
+ * that don't exist.
+ */
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+ return -EINVAL;
+
+ reg->type = KVM_X86_REG_TYPE_MSR;
+ reg->index = MSR_KVM_INTERNAL_GUEST_SSP;
+ break;
+ default:
+ return -EINVAL;
+ }
+ return 0;
}
static int kvm_get_one_msr(struct kvm_vcpu *vcpu, u32 msr, u64 __user *user_val)
@@ -6067,7 +6085,7 @@ static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
return -EINVAL;
if (reg->type == KVM_X86_REG_TYPE_KVM) {
- r = kvm_translate_kvm_reg(reg);
+ r = kvm_translate_kvm_reg(vcpu, reg);
if (r)
return r;
}
@@ -6098,11 +6116,22 @@ static int kvm_get_set_one_reg(struct kvm_vcpu *vcpu, unsigned int ioctl,
static int kvm_get_reg_list(struct kvm_vcpu *vcpu,
struct kvm_reg_list __user *user_list)
{
- u64 nr_regs = 0;
+ u64 nr_regs = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) ? 1 : 0;
+ u64 user_nr_regs;
+
+ if (get_user(user_nr_regs, &user_list->n))
+ return -EFAULT;
if (put_user(nr_regs, &user_list->n))
return -EFAULT;
+ if (user_nr_regs < nr_regs)
+ return -E2BIG;
+
+ if (nr_regs &&
+ put_user(KVM_X86_REG_KVM(KVM_REG_GUEST_SSP), &user_list->reg[0]))
+ return -EFAULT;
+
return 0;
}
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 786e36fcd0fb..a7c9c72fca93 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -101,6 +101,16 @@ do { \
#define KVM_SVM_DEFAULT_PLE_WINDOW_MAX USHRT_MAX
#define KVM_SVM_DEFAULT_PLE_WINDOW 3000
+/*
+ * KVM's internal, non-ABI indices for synthetic MSRs. The values themselves
+ * are arbitrary and have no meaning, the only requirement is that they don't
+ * conflict with "real" MSRs that KVM supports. Use values at the upper end
+ * of KVM's reserved paravirtual MSR range to minimize churn, i.e. these values
+ * will be usable until KVM exhausts its supply of paravirtual MSR indices.
+ */
+
+#define MSR_KVM_INTERNAL_GUEST_SSP 0x4b564dff
+
static inline unsigned int __grow_ple_window(unsigned int val,
unsigned int base, unsigned int modifier, unsigned int max)
{
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 14/51] KVM: VMX: Emulate read and write to CET MSRs
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (12 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 13/51] KVM: x86: Enable guest SSP read/write interface with new uAPIs Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 15/51] KVM: x86: Save and reload SSP to/from SMRAM Sean Christopherson
` (38 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Add emulation interface for CET MSR access. The emulation code is split
into common part and vendor specific part. The former does common checks
for MSRs, e.g., accessibility, data validity etc., then passes operation
to either XSAVE-managed MSRs via the helpers or CET VMCS fields.
SSP can only be read via RDSSP. Writing even requires destructive and
potentially faulting operations such as SAVEPREVSSP/RSTORSSP or
SETSSBSY/CLRSSBSY. Let the host use a pseudo-MSR that is just a wrapper
for the GUEST_SSP field of the VMCS.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: drop call to kvm_set_xstate_msr() for S_CET, consolidate code]
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/vmx.c | 18 ++++++++++++
arch/x86/kvm/x86.c | 64 ++++++++++++++++++++++++++++++++++++++++--
arch/x86/kvm/x86.h | 23 +++++++++++++++
3 files changed, 103 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 35037fc326e5..e271e3785561 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2106,6 +2106,15 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
msr_info->data = vmx->pt_desc.guest.addr_a[index / 2];
break;
+ case MSR_IA32_S_CET:
+ msr_info->data = vmcs_readl(GUEST_S_CET);
+ break;
+ case MSR_KVM_INTERNAL_GUEST_SSP:
+ msr_info->data = vmcs_readl(GUEST_SSP);
+ break;
+ case MSR_IA32_INT_SSP_TAB:
+ msr_info->data = vmcs_readl(GUEST_INTR_SSP_TABLE);
+ break;
case MSR_IA32_DEBUGCTLMSR:
msr_info->data = vmx_guest_debugctl_read();
break;
@@ -2424,6 +2433,15 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
else
vmx->pt_desc.guest.addr_a[index / 2] = data;
break;
+ case MSR_IA32_S_CET:
+ vmcs_writel(GUEST_S_CET, data);
+ break;
+ case MSR_KVM_INTERNAL_GUEST_SSP:
+ vmcs_writel(GUEST_SSP, data);
+ break;
+ case MSR_IA32_INT_SSP_TAB:
+ vmcs_writel(GUEST_INTR_SSP_TABLE, data);
+ break;
case MSR_IA32_PERF_CAPABILITIES:
if (data & PERF_CAP_LBR_FMT) {
if ((data & PERF_CAP_LBR_FMT) !=
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 720540f102e1..fee90388a861 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1890,6 +1890,44 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
data = (u32)data;
break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
+ return KVM_MSR_RET_UNSUPPORTED;
+ if (!kvm_is_valid_u_s_cet(vcpu, data))
+ return 1;
+ break;
+ case MSR_KVM_INTERNAL_GUEST_SSP:
+ if (!host_initiated)
+ return 1;
+ fallthrough;
+ /*
+ * Note that the MSR emulation here is flawed when a vCPU
+ * doesn't support the Intel 64 architecture. The expected
+ * architectural behavior in this case is that the upper 32
+ * bits do not exist and should always read '0'. However,
+ * because the actual hardware on which the virtual CPU is
+ * running does support Intel 64, XRSTORS/XSAVES in the
+ * guest could observe behavior that violates the
+ * architecture. Intercepting XRSTORS/XSAVES for this
+ * special case isn't deemed worthwhile.
+ */
+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+ return KVM_MSR_RET_UNSUPPORTED;
+ /*
+ * MSR_IA32_INT_SSP_TAB is not present on processors that do
+ * not support Intel 64 architecture.
+ */
+ if (index == MSR_IA32_INT_SSP_TAB && !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
+ return KVM_MSR_RET_UNSUPPORTED;
+ if (is_noncanonical_msr_address(data, vcpu))
+ return 1;
+ /* All SSP MSRs except MSR_IA32_INT_SSP_TAB must be 4-byte aligned */
+ if (index != MSR_IA32_INT_SSP_TAB && !IS_ALIGNED(data, 4))
+ return 1;
+ break;
}
msr.data = data;
@@ -1934,6 +1972,20 @@ static int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
!guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
return 1;
break;
+ case MSR_IA32_U_CET:
+ case MSR_IA32_S_CET:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT))
+ return KVM_MSR_RET_UNSUPPORTED;
+ break;
+ case MSR_KVM_INTERNAL_GUEST_SSP:
+ if (!host_initiated)
+ return 1;
+ fallthrough;
+ case MSR_IA32_PL0_SSP ... MSR_IA32_INT_SSP_TAB:
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+ return KVM_MSR_RET_UNSUPPORTED;
+ break;
}
msr.index = index;
@@ -3865,12 +3917,12 @@ static __always_inline void kvm_access_xstate_msr(struct kvm_vcpu *vcpu,
kvm_fpu_put();
}
-static __maybe_unused void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+static void kvm_set_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_W);
}
-static __maybe_unused void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+static void kvm_get_xstate_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
kvm_access_xstate_msr(vcpu, msr_info, MSR_TYPE_R);
}
@@ -4256,6 +4308,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
vcpu->arch.guest_fpu.xfd_err = data;
break;
#endif
+ case MSR_IA32_U_CET:
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ kvm_set_xstate_msr(vcpu, msr_info);
+ break;
default:
if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_set_msr(vcpu, msr_info);
@@ -4605,6 +4661,10 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
msr_info->data = vcpu->arch.guest_fpu.xfd_err;
break;
#endif
+ case MSR_IA32_U_CET:
+ case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
+ kvm_get_xstate_msr(vcpu, msr_info);
+ break;
default:
if (kvm_pmu_is_valid_msr(vcpu, msr_info->index))
return kvm_pmu_get_msr(vcpu, msr_info);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a7c9c72fca93..076eccba0f7e 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -710,4 +710,27 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, int cpl,
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
+#define CET_US_RESERVED_BITS GENMASK(9, 6)
+#define CET_US_SHSTK_MASK_BITS GENMASK(1, 0)
+#define CET_US_IBT_MASK_BITS (GENMASK_ULL(5, 2) | GENMASK_ULL(63, 10))
+#define CET_US_LEGACY_BITMAP_BASE(data) ((data) >> 12)
+
+static inline bool kvm_is_valid_u_s_cet(struct kvm_vcpu *vcpu, u64 data)
+{
+ if (data & CET_US_RESERVED_BITS)
+ return false;
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+ (data & CET_US_SHSTK_MASK_BITS))
+ return false;
+ if (!guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
+ (data & CET_US_IBT_MASK_BITS))
+ return false;
+ if (!IS_ALIGNED(CET_US_LEGACY_BITMAP_BASE(data), 4))
+ return false;
+ /* IBT can be suppressed iff the TRACKER isn't WAIT_ENDBR. */
+ if ((data & CET_SUPPRESS) && (data & CET_WAIT_ENDBR))
+ return false;
+
+ return true;
+}
#endif
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 15/51] KVM: x86: Save and reload SSP to/from SMRAM
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (13 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 14/51] KVM: VMX: Emulate read and write to CET MSRs Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 16/51] KVM: VMX: Set up interception for CET MSRs Sean Christopherson
` (37 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Save CET SSP to SMRAM on SMI and reload it on RSM. KVM emulates HW arch
behavior when guest enters/leaves SMM mode,i.e., save registers to SMRAM
at the entry of SMM and reload them at the exit to SMM. Per SDM, SSP is
one of such registers on 64-bit Arch, and add the support for SSP.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/smm.c | 8 ++++++++
arch/x86/kvm/smm.h | 2 +-
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index 5dd8a1646800..b0b14ba37f9a 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -269,6 +269,10 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu,
enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
smram->int_shadow = kvm_x86_call(get_interrupt_shadow)(vcpu);
+
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+ kvm_msr_read(vcpu, MSR_KVM_INTERNAL_GUEST_SSP, &smram->ssp))
+ kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
}
#endif
@@ -558,6 +562,10 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
kvm_x86_call(set_interrupt_shadow)(vcpu, 0);
ctxt->interruptibility = (u8)smstate->int_shadow;
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+ kvm_msr_write(vcpu, MSR_KVM_INTERNAL_GUEST_SSP, smstate->ssp))
+ return X86EMUL_UNHANDLEABLE;
+
return X86EMUL_CONTINUE;
}
#endif
diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h
index 551703fbe200..db3c88f16138 100644
--- a/arch/x86/kvm/smm.h
+++ b/arch/x86/kvm/smm.h
@@ -116,8 +116,8 @@ struct kvm_smram_state_64 {
u32 smbase;
u32 reserved4[5];
- /* ssp and svm_* fields below are not implemented by KVM */
u64 ssp;
+ /* svm_* fields below are not implemented by KVM */
u64 svm_guest_pat;
u64 svm_host_efer;
u64 svm_host_cr4;
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 16/51] KVM: VMX: Set up interception for CET MSRs
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (14 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 15/51] KVM: x86: Save and reload SSP to/from SMRAM Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 17/51] KVM: VMX: Set host constant supervisor states to VMCS fields Sean Christopherson
` (36 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Disable interception for CET MSRs that can be accessed via XSAVES/XRSTORS,
and exist accordingly to CPUID, as accesses through XSTATE aren't subject
to MSR interception checks, i.e. can't be intercepted without intercepting
and emulating XSAVES/XRSTORS, and KVM doesn't support emulating
XSAVE/XRSTOR instructions.
Don't condition interception on the guest actually having XSAVES as there
is no benefit to intercepting the accesses (when the MSRs exist). The
MSRs in question are either context switched by the CPU on VM-Enter/VM-Exit
or by KVM via XSAVES/XRSTORS (KVM requires XSAVES to virtualization SHSTK),
i.e. KVM is going to load guest values into hardware irrespective of guest
XSAVES support.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/vmx.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e271e3785561..5fe4a4b8efb1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4101,6 +4101,8 @@ void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu)
static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
{
+ bool intercept;
+
if (!cpu_has_vmx_msr_bitmap())
return;
@@ -4146,6 +4148,23 @@ static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
!guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
+ if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+ intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, MSR_TYPE_RW, intercept);
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, MSR_TYPE_RW, intercept);
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, MSR_TYPE_RW, intercept);
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, intercept);
+ }
+
+ if (kvm_cpu_cap_has(X86_FEATURE_SHSTK) || kvm_cpu_cap_has(X86_FEATURE_IBT)) {
+ intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
+ !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, MSR_TYPE_RW, intercept);
+ vmx_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, intercept);
+ }
+
/*
* x2APIC and LBR MSR intercepts are modified on-demand and cannot be
* filtered by userspace.
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 17/51] KVM: VMX: Set host constant supervisor states to VMCS fields
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (15 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 16/51] KVM: VMX: Set up interception for CET MSRs Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 3:03 ` Binbin Wu
2025-09-19 22:32 ` [PATCH v16 18/51] KVM: x86: Don't emulate instructions affected by CET features Sean Christopherson
` (35 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Save constant values to HOST_{S_CET,SSP,INTR_SSP_TABLE} field explicitly.
Kernel IBT is supported and the setting in MSR_IA32_S_CET is static after
post-boot(The exception is BIOS call case but vCPU thread never across it)
and KVM doesn't need to refresh HOST_S_CET field before every VM-Enter/
VM-Exit sequence.
Host supervisor shadow stack is not enabled now and SSP is not accessible
to kernel mode, thus it's safe to set host IA32_INT_SSP_TAB/SSP VMCS field
to 0s. When shadow stack is enabled for CPL3, SSP is reloaded from PL3_SSP
before it exits to userspace. Check SDM Vol 2A/B Chapter 3/4 for SYSCALL/
SYSRET/SYSENTER SYSEXIT/RDSSP/CALL etc.
Prevent KVM module loading if host supervisor shadow stack SHSTK_EN is set
in MSR_IA32_S_CET as KVM cannot co-exit with it correctly.
Suggested-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: snapshot host S_CET if SHSTK *or* IBT is supported]
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/capabilities.h | 4 ++++
arch/x86/kvm/vmx/vmx.c | 15 +++++++++++++++
arch/x86/kvm/x86.c | 12 ++++++++++++
arch/x86/kvm/x86.h | 1 +
4 files changed, 32 insertions(+)
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index f614428dbeda..59c83888bdc0 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -100,6 +100,10 @@ static inline bool cpu_has_load_perf_global_ctrl(void)
return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
}
+static inline bool cpu_has_load_cet_ctrl(void)
+{
+ return (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_CET_STATE);
+}
static inline bool cpu_has_vmx_mpx(void)
{
return vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_BNDCFGS;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5fe4a4b8efb1..a7d9e60b2771 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4325,6 +4325,21 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
if (cpu_has_load_ia32_efer())
vmcs_write64(HOST_IA32_EFER, kvm_host.efer);
+
+ /*
+ * Supervisor shadow stack is not enabled on host side, i.e.,
+ * host IA32_S_CET.SHSTK_EN bit is guaranteed to 0 now, per SDM
+ * description(RDSSP instruction), SSP is not readable in CPL0,
+ * so resetting the two registers to 0s at VM-Exit does no harm
+ * to kernel execution. When execution flow exits to userspace,
+ * SSP is reloaded from IA32_PL3_SSP. Check SDM Vol.2A/B Chapter
+ * 3 and 4 for details.
+ */
+ if (cpu_has_load_cet_ctrl()) {
+ vmcs_writel(HOST_S_CET, kvm_host.s_cet);
+ vmcs_writel(HOST_SSP, 0);
+ vmcs_writel(HOST_INTR_SSP_TABLE, 0);
+ }
}
void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fee90388a861..d2cccc7594d4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9997,6 +9997,18 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
return -EIO;
}
+ if (boot_cpu_has(X86_FEATURE_SHSTK) || boot_cpu_has(X86_FEATURE_IBT)) {
+ rdmsrq(MSR_IA32_S_CET, kvm_host.s_cet);
+ /*
+ * Linux doesn't yet support supervisor shadow stacks (SSS), so
+ * KVM doesn't save/restore the associated MSRs, i.e. KVM may
+ * clobber the host values. Yell and refuse to load if SSS is
+ * unexpectedly enabled, e.g. to avoid crashing the host.
+ */
+ if (WARN_ON_ONCE(kvm_host.s_cet & CET_SHSTK_EN))
+ return -EIO;
+ }
+
memset(&kvm_caps, 0, sizeof(kvm_caps));
x86_emulator_cache = kvm_alloc_emulator_cache();
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 076eccba0f7e..65cbd454c4f1 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -50,6 +50,7 @@ struct kvm_host_values {
u64 efer;
u64 xcr0;
u64 xss;
+ u64 s_cet;
u64 arch_capabilities;
};
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 18/51] KVM: x86: Don't emulate instructions affected by CET features
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (16 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 17/51] KVM: VMX: Set host constant supervisor states to VMCS fields Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 5:39 ` Binbin Wu
2025-09-22 10:27 ` Chao Gao
2025-09-19 22:32 ` [PATCH v16 19/51] KVM: x86: Don't emulate task switches when IBT or SHSTK is enabled Sean Christopherson
` (34 subsequent siblings)
52 siblings, 2 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Don't emulate branch instructions, e.g. CALL/RET/JMP etc., that are
affected by Shadow Stacks and/or Indirect Branch Tracking when said
features are enabled in the guest, as fully emulating CET would require
significant complexity for no practical benefit (KVM shouldn't need to
emulate branch instructions on modern hosts). Simply doing nothing isn't
an option as that would allow a malicious entity to subvert CET
protections via the emulator.
To detect instructions that are subject to IBT or affect IBT state, use
the existing IsBranch flag along with the source operand type to detect
indirect branches, and the existing NearBranch flag to detect far branches
(which can affect IBT state even if the branch itself is direct).
For Shadow Stacks, explicitly track instructions that directly affect the
current SSP, as KVM's emulator doesn't have existing flags that can be
used to precisely detect such instructions. Alternatively, the em_xxx()
helpers could directly check for ShadowStack interactions, but using a
dedicated flag is arguably easier to audit, and allows for handling both
IBT and SHSTK in one fell swoop.
Note! On far transfers, do NOT consult the current privilege level and
instead treat SHSTK/IBT as being enabled if they're enabled for User *or*
Supervisor mode. On inter-privilege level far transfers, SHSTK and IBT
can be in play for the target privilege level, i.e. checking the current
privilege could get a false negative, and KVM doesn't know the target
privilege level until emulation gets under way.
Note #2, FAR JMP from 64-bit mode to compatibility mode interacts with
the current SSP, but only to ensure SSP[63:32] == 0. Don't tag FAR JMP
as SHSTK, which would be rather confusing and would result in FAR JMP
being rejected unnecessarily the vast majority of the time (ignoring that
it's unlikely to ever be emulated). A future commit will add the #GP(0)
check for the specific FAR JMP scenario.
Note #3, task switches also modify SSP and so need to be rejected. That
too will be addressed in a future commit.
Suggested-by: Chao Gao <chao.gao@intel.com>
Originally-by: Yang Weijiang <weijiang.yang@intel.com>
Cc: Mathias Krause <minipli@grsecurity.net>
Cc: John Allen <john.allen@amd.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/emulate.c | 114 ++++++++++++++++++++++++++++++++++++-----
1 file changed, 100 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 23929151a5b8..dc0249929cbf 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -178,6 +178,7 @@
#define IncSP ((u64)1 << 54) /* SP is incremented before ModRM calc */
#define TwoMemOp ((u64)1 << 55) /* Instruction has two memory operand */
#define IsBranch ((u64)1 << 56) /* Instruction is considered a branch. */
+#define ShadowStack ((u64)1 << 57) /* Instruction affects Shadow Stacks. */
#define DstXacc (DstAccLo | SrcAccHi | SrcWrite)
@@ -660,6 +661,57 @@ static inline bool emul_is_noncanonical_address(u64 la,
return !ctxt->ops->is_canonical_addr(ctxt, la, flags);
}
+static bool is_shstk_instruction(u64 flags)
+{
+ return flags & ShadowStack;
+}
+
+static bool is_ibt_instruction(u64 flags)
+{
+ if (!(flags & IsBranch))
+ return false;
+
+ /*
+ * Far transfers can affect IBT state even if the branch itself is
+ * direct, e.g. when changing privilege levels and loading a conforming
+ * code segment. For simplicity, treat all far branches as affecting
+ * IBT. False positives are acceptable (emulating far branches on an
+ * IBT-capable CPU won't happen in practice), while false negatives
+ * could impact guest security.
+ *
+ * Note, this also handles SYCALL and SYSENTER.
+ */
+ if (!(flags & NearBranch))
+ return true;
+
+ switch (flags & (OpMask << SrcShift)) {
+ case SrcReg:
+ case SrcMem:
+ case SrcMem16:
+ case SrcMem32:
+ return true;
+ case SrcMemFAddr:
+ case SrcImmFAddr:
+ /* Far branches should be handled above. */
+ WARN_ON_ONCE(1);
+ return true;
+ case SrcNone:
+ case SrcImm:
+ case SrcImmByte:
+ /*
+ * Note, ImmU16 is used only for the stack adjustment operand on ENTER
+ * and RET instructions. ENTER isn't a branch and RET FAR is handled
+ * by the NearBranch check above. RET itself isn't an indirect branch.
+ */
+ case SrcImmU16:
+ return false;
+ default:
+ WARN_ONCE(1, "Unexpected Src operand '%llx' on branch",
+ (flags & (OpMask << SrcShift)));
+ return false;
+ }
+}
+
/*
* x86 defines three classes of vector instructions: explicitly
* aligned, explicitly unaligned, and the rest, which change behaviour
@@ -4068,9 +4120,9 @@ static const struct opcode group4[] = {
static const struct opcode group5[] = {
F(DstMem | SrcNone | Lock, em_inc),
F(DstMem | SrcNone | Lock, em_dec),
- I(SrcMem | NearBranch | IsBranch, em_call_near_abs),
- I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far),
- I(SrcMem | NearBranch | IsBranch, em_jmp_abs),
+ I(SrcMem | NearBranch | IsBranch | ShadowStack, em_call_near_abs),
+ I(SrcMemFAddr | ImplicitOps | IsBranch | ShadowStack, em_call_far),
+ I(SrcMem | NearBranch | IsBranch, em_jmp_abs),
I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far),
I(SrcMem | Stack | TwoMemOp, em_push), D(Undefined),
};
@@ -4304,7 +4356,7 @@ static const struct opcode opcode_table[256] = {
DI(SrcAcc | DstReg, pause), X7(D(SrcAcc | DstReg)),
/* 0x98 - 0x9F */
D(DstAcc | SrcNone), I(ImplicitOps | SrcAcc, em_cwd),
- I(SrcImmFAddr | No64 | IsBranch, em_call_far), N,
+ I(SrcImmFAddr | No64 | IsBranch | ShadowStack, em_call_far), N,
II(ImplicitOps | Stack, em_pushf, pushf),
II(ImplicitOps | Stack, em_popf, popf),
I(ImplicitOps, em_sahf), I(ImplicitOps, em_lahf),
@@ -4324,19 +4376,19 @@ static const struct opcode opcode_table[256] = {
X8(I(DstReg | SrcImm64 | Mov, em_mov)),
/* 0xC0 - 0xC7 */
G(ByteOp | Src2ImmByte, group2), G(Src2ImmByte, group2),
- I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch, em_ret_near_imm),
- I(ImplicitOps | NearBranch | IsBranch, em_ret),
+ I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch | ShadowStack, em_ret_near_imm),
+ I(ImplicitOps | NearBranch | IsBranch | ShadowStack, em_ret),
I(DstReg | SrcMemFAddr | ModRM | No64 | Src2ES, em_lseg),
I(DstReg | SrcMemFAddr | ModRM | No64 | Src2DS, em_lseg),
G(ByteOp, group11), G(0, group11),
/* 0xC8 - 0xCF */
I(Stack | SrcImmU16 | Src2ImmByte, em_enter),
I(Stack, em_leave),
- I(ImplicitOps | SrcImmU16 | IsBranch, em_ret_far_imm),
- I(ImplicitOps | IsBranch, em_ret_far),
- D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch, intn),
+ I(ImplicitOps | SrcImmU16 | IsBranch | ShadowStack, em_ret_far_imm),
+ I(ImplicitOps | IsBranch | ShadowStack, em_ret_far),
+ D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch | ShadowStack, intn),
D(ImplicitOps | No64 | IsBranch),
- II(ImplicitOps | IsBranch, em_iret, iret),
+ II(ImplicitOps | IsBranch | ShadowStack, em_iret, iret),
/* 0xD0 - 0xD7 */
G(Src2One | ByteOp, group2), G(Src2One, group2),
G(Src2CL | ByteOp, group2), G(Src2CL, group2),
@@ -4352,7 +4404,7 @@ static const struct opcode opcode_table[256] = {
I2bvIP(SrcImmUByte | DstAcc, em_in, in, check_perm_in),
I2bvIP(SrcAcc | DstImmUByte, em_out, out, check_perm_out),
/* 0xE8 - 0xEF */
- I(SrcImm | NearBranch | IsBranch, em_call),
+ I(SrcImm | NearBranch | IsBranch | ShadowStack, em_call),
D(SrcImm | ImplicitOps | NearBranch | IsBranch),
I(SrcImmFAddr | No64 | IsBranch, em_jmp_far),
D(SrcImmByte | ImplicitOps | NearBranch | IsBranch),
@@ -4371,7 +4423,7 @@ static const struct opcode opcode_table[256] = {
static const struct opcode twobyte_table[256] = {
/* 0x00 - 0x0F */
G(0, group6), GD(0, &group7), N, N,
- N, I(ImplicitOps | EmulateOnUD | IsBranch, em_syscall),
+ N, I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack, em_syscall),
II(ImplicitOps | Priv, em_clts, clts), N,
DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N,
N, D(ImplicitOps | ModRM | SrcMem | NoAccess), N, N,
@@ -4402,8 +4454,8 @@ static const struct opcode twobyte_table[256] = {
IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc),
II(ImplicitOps | Priv, em_rdmsr, rdmsr),
IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc),
- I(ImplicitOps | EmulateOnUD | IsBranch, em_sysenter),
- I(ImplicitOps | Priv | EmulateOnUD | IsBranch, em_sysexit),
+ I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack, em_sysenter),
+ I(ImplicitOps | Priv | EmulateOnUD | IsBranch | ShadowStack, em_sysexit),
N, N,
N, N, N, N, N, N, N, N,
/* 0x40 - 0x4F */
@@ -4941,6 +4993,40 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
if (ctxt->d == 0)
return EMULATION_FAILED;
+ /*
+ * Reject emulation if KVM might need to emulate shadow stack updates
+ * and/or indirect branch tracking enforcement, which the emulator
+ * doesn't support.
+ */
+ if ((is_ibt_instruction(ctxt->d) || is_shstk_instruction(ctxt->d)) &&
+ ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
+ u64 u_cet = 0, s_cet = 0;
+
+ /*
+ * Check both User and Supervisor on far transfers as inter-
+ * privilege level transfers are impacted by CET at the target
+ * privilege level, and that is not known at this time. The
+ * the expectation is that the guest will not require emulation
+ * of any CET-affected instructions at any privilege level.
+ */
+ if (!(ctxt->d & NearBranch))
+ u_cet = s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
+ else if (ctxt->ops->cpl(ctxt) == 3)
+ u_cet = CET_SHSTK_EN | CET_ENDBR_EN;
+ else
+ s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
+
+ if ((u_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet)) ||
+ (s_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet)))
+ return EMULATION_FAILED;
+
+ if ((u_cet | s_cet) & CET_SHSTK_EN && is_shstk_instruction(ctxt->d))
+ return EMULATION_FAILED;
+
+ if ((u_cet | s_cet) & CET_ENDBR_EN && is_ibt_instruction(ctxt->d))
+ return EMULATION_FAILED;
+ }
+
ctxt->execute = opcode.u.execute;
if (unlikely(emulation_type & EMULTYPE_TRAP_UD) &&
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 19/51] KVM: x86: Don't emulate task switches when IBT or SHSTK is enabled
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (17 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 18/51] KVM: x86: Don't emulate instructions affected by CET features Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 6:41 ` Binbin Wu
2025-09-22 11:27 ` Chao Gao
2025-09-19 22:32 ` [PATCH v16 20/51] KVM: x86: Emulate SSP[63:32]!=0 #GP(0) for FAR JMP to 32-bit mode Sean Christopherson
` (33 subsequent siblings)
52 siblings, 2 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if the guest triggers
task switch emulation with Indirect Branch Tracking or Shadow Stacks
enabled, as attempting to do the right thing would require non-trivial
effort and complexity, KVM doesn't support emulating CET generally, and
it's extremely unlikely that any guest will do task switches while also
utilizing CET. Defer taking on the complexity until someone cares enough
to put in the time and effort to add support.
Per the SDM:
If shadow stack is enabled, then the SSP of the task is located at the
4 bytes at offset 104 in the 32-bit TSS and is used by the processor to
establish the SSP when a task switch occurs from a task associated with
this TSS. Note that the processor does not write the SSP of the task
initiating the task switch to the TSS of that task, and instead the SSP
of the previous task is pushed onto the shadow stack of the new task.
Note, per the SDM's pseudocode on TASK SWITCHING, IBT state for the new
privilege level is updated. To keep things simple, check both S_CET and
U_CET (again, anyone that wants more precise checking can have the honor
of implementing support).
Reported-by: Binbin Wu <binbin.wu@linux.intel.com>
Closes: https://lore.kernel.org/all/819bd98b-2a60-4107-8e13-41f1e4c706b1@linux.intel.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 35 ++++++++++++++++++++++++++++-------
1 file changed, 28 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d2cccc7594d4..0c060e506f9d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12178,6 +12178,25 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
int ret;
+ if (kvm_is_cr4_bit_set(vcpu, X86_CR4_CET)) {
+ u64 u_cet, s_cet;
+
+ /*
+ * Check both User and Supervisor on task switches as inter-
+ * privilege level task switches are impacted by CET at both
+ * the current privilege level and the new privilege level, and
+ * that information is not known at this time. The expectation
+ * is that the guest won't require emulation of task switches
+ * while using IBT or Shadow Stacks.
+ */
+ if (__kvm_emulate_msr_read(vcpu, MSR_IA32_U_CET, &u_cet) ||
+ __kvm_emulate_msr_read(vcpu, MSR_IA32_S_CET, &s_cet))
+ return EMULATION_FAILED;
+
+ if ((u_cet | s_cet) & CET_SHSTK_EN)
+ goto unhandled_task_switch;
+ }
+
init_emulate_ctxt(vcpu);
ret = emulator_task_switch(ctxt, tss_selector, idt_index, reason,
@@ -12187,17 +12206,19 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
* Report an error userspace if MMIO is needed, as KVM doesn't support
* MMIO during a task switch (or any other complex operation).
*/
- if (ret || vcpu->mmio_needed) {
- vcpu->mmio_needed = false;
- vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
- vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
- vcpu->run->internal.ndata = 0;
- return 0;
- }
+ if (ret || vcpu->mmio_needed)
+ goto unhandled_task_switch;
kvm_rip_write(vcpu, ctxt->eip);
kvm_set_rflags(vcpu, ctxt->eflags);
return 1;
+
+unhandled_task_switch:
+ vcpu->mmio_needed = false;
+ vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+ vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+ vcpu->run->internal.ndata = 0;
+ return 0;
}
EXPORT_SYMBOL_GPL(kvm_task_switch);
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 20/51] KVM: x86: Emulate SSP[63:32]!=0 #GP(0) for FAR JMP to 32-bit mode
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (18 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 19/51] KVM: x86: Don't emulate task switches when IBT or SHSTK is enabled Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 7:15 ` Binbin Wu
2025-09-23 14:29 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 21/51] KVM: x86/mmu: WARN on attempt to check permissions for Shadow Stack #PF Sean Christopherson
` (32 subsequent siblings)
52 siblings, 2 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Emulate the Shadow Stack restriction that the current SSP must be a 32-bit
value on a FAR JMP from 64-bit mode to compatibility mode. From the SDM's
pseudocode for FAR JMP:
IF ShadowStackEnabled(CPL)
IF (IA32_EFER.LMA and DEST(segment selector).L) = 0
(* If target is legacy or compatibility mode then the SSP must be in low 4GB *)
IF (SSP & 0xFFFFFFFF00000000 != 0); THEN
#GP(0);
FI;
FI;
FI;
Note, only the current CPL needs to be considered, as FAR JMP can't be
used for inter-privilege level transfers, and KVM rejects emulation of all
other far branch instructions when Shadow Stacks are enabled.
To give the emulator access to GUEST_SSP, special case handling
MSR_KVM_INTERNAL_GUEST_SSP in emulator_get_msr() to treat the access as a
host access (KVM doesn't allow guest accesses to internal "MSRs"). The
->get_msr() API is only used for implicit accesses from the emulator, i.e.
is only used with hardcoded MSR indices, and so any access to
MSR_KVM_INTERNAL_GUEST_SSP is guaranteed to be from KVM, i.e. not from the
guest via RDMSR.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/emulate.c | 35 +++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 9 +++++++++
2 files changed, 44 insertions(+)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index dc0249929cbf..5c5fb6a6f7f9 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1605,6 +1605,37 @@ static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt,
return linear_write_system(ctxt, addr, desc, sizeof(*desc));
}
+static bool emulator_is_ssp_invalid(struct x86_emulate_ctxt *ctxt, u8 cpl)
+{
+ const u32 MSR_IA32_X_CET = cpl == 3 ? MSR_IA32_U_CET : MSR_IA32_S_CET;
+ u64 efer = 0, cet = 0, ssp = 0;
+
+ if (!(ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET))
+ return false;
+
+ if (ctxt->ops->get_msr(ctxt, MSR_EFER, &efer))
+ return true;
+
+ /* SSP is guaranteed to be valid if the vCPU was already in 32-bit mode. */
+ if (!(efer & EFER_LMA))
+ return false;
+
+ if (ctxt->ops->get_msr(ctxt, MSR_IA32_X_CET, &cet))
+ return true;
+
+ if (!(cet & CET_SHSTK_EN))
+ return false;
+
+ if (ctxt->ops->get_msr(ctxt, MSR_KVM_INTERNAL_GUEST_SSP, &ssp))
+ return true;
+
+ /*
+ * On transfer from 64-bit mode to compatibility mode, SSP[63:32] must
+ * be 0, i.e. SSP must be a 32-bit value outside of 64-bit mode.
+ */
+ return ssp >> 32;
+}
+
static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
u16 selector, int seg, u8 cpl,
enum x86_transfer_type transfer,
@@ -1745,6 +1776,10 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
if (efer & EFER_LMA)
goto exception;
}
+ if (!seg_desc.l && emulator_is_ssp_invalid(ctxt, cpl)) {
+ err_code = 0;
+ goto exception;
+ }
/* CS(RPL) <- CPL */
selector = (selector & 0xfffc) | cpl;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0c060e506f9d..40596fc5142e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8741,6 +8741,15 @@ static int emulator_set_msr_with_filter(struct x86_emulate_ctxt *ctxt,
static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
u32 msr_index, u64 *pdata)
{
+ /*
+ * Treat emulator accesses to the current shadow stack pointer as host-
+ * initiated, as they aren't true MSR accesses (SSP is a "just a reg"),
+ * and this API is used only for implicit accesses, i.e. not RDMSR, and
+ * so the index is fully KVM-controlled.
+ */
+ if (unlikely(msr_index == MSR_KVM_INTERNAL_GUEST_SSP))
+ return kvm_msr_read(emul_to_vcpu(ctxt), msr_index, pdata);
+
return __kvm_emulate_msr_read(emul_to_vcpu(ctxt), msr_index, pdata);
}
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 21/51] KVM: x86/mmu: WARN on attempt to check permissions for Shadow Stack #PF
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (19 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 20/51] KVM: x86: Emulate SSP[63:32]!=0 #GP(0) for FAR JMP to 32-bit mode Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 7:17 ` Binbin Wu
2025-09-19 22:32 ` [PATCH v16 22/51] KVM: x86/mmu: Pretty print PK, SS, and SGX flags in MMU tracepoints Sean Christopherson
` (31 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Add PFERR_SS_MASK, a.k.a. Shadow Stack access, and WARN if KVM attempts to
check permissions for a Shadow Stack access as KVM hasn't been taught to
understand the magic Writable=0,Dirty=0 combination that is required for
Shadow Stack accesses, and likely will never learn. There are no plans to
support Shadow Stacks with the Shadow MMU, and the emulator rejects all
instructions that affect Shadow Stacks, i.e. it should be impossible for
KVM to observe a #PF due to a shadow stack access.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu.h | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7a7e6356a8dd..554d83ff6135 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -267,6 +267,7 @@ enum x86_intercept_stage;
#define PFERR_RSVD_MASK BIT(3)
#define PFERR_FETCH_MASK BIT(4)
#define PFERR_PK_MASK BIT(5)
+#define PFERR_SS_MASK BIT(6)
#define PFERR_SGX_MASK BIT(15)
#define PFERR_GUEST_RMP_MASK BIT_ULL(31)
#define PFERR_GUEST_FINAL_MASK BIT_ULL(32)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index b4b6860ab971..f63074048ec6 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -212,7 +212,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
fault = (mmu->permissions[index] >> pte_access) & 1;
- WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));
+ WARN_ON_ONCE(pfec & (PFERR_PK_MASK | PFERR_SS_MASK | PFERR_RSVD_MASK));
if (unlikely(mmu->pkru_mask)) {
u32 pkru_bits, offset;
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 22/51] KVM: x86/mmu: Pretty print PK, SS, and SGX flags in MMU tracepoints
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (20 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 21/51] KVM: x86/mmu: WARN on attempt to check permissions for Shadow Stack #PF Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 7:18 ` Binbin Wu
2025-09-23 14:46 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 23/51] KVM: x86: Allow setting CR4.CET if IBT or SHSTK is supported Sean Christopherson
` (30 subsequent siblings)
52 siblings, 2 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Add PK (Protection Keys), SS (Shadow Stacks), and SGX (Software Guard
Extensions) to the set of #PF error flags handled via
kvm_mmu_trace_pferr_flags. While KVM doesn't expect PK or SS #PFs in
particular, pretty print their names instead of the raw hex value saves
the user from having to go spelunking in the SDM to figure out what's
going on.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/mmu/mmutrace.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index f35a830ce469..764e3015d021 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -51,6 +51,9 @@
{ PFERR_PRESENT_MASK, "P" }, \
{ PFERR_WRITE_MASK, "W" }, \
{ PFERR_USER_MASK, "U" }, \
+ { PFERR_PK_MASK, "PK" }, \
+ { PFERR_SS_MASK, "SS" }, \
+ { PFERR_SGX_MASK, "SGX" }, \
{ PFERR_RSVD_MASK, "RSVD" }, \
{ PFERR_FETCH_MASK, "F" }
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 23/51] KVM: x86: Allow setting CR4.CET if IBT or SHSTK is supported
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (21 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 22/51] KVM: x86/mmu: Pretty print PK, SS, and SGX flags in MMU tracepoints Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 7:25 ` Binbin Wu
2025-09-23 14:46 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 24/51] KVM: nVMX: Always forward XSAVES/XRSTORS exits from L2 to L1 Sean Christopherson
` (29 subsequent siblings)
52 siblings, 2 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Drop X86_CR4_CET from CR4_RESERVED_BITS and instead mark CET as reserved
if and only if IBT *and* SHSTK are unsupported, i.e. allow CR4.CET to be
set if IBT or SHSTK is supported. This creates a virtualization hole if
the CPU supports both IBT and SHSTK, but the kernel or vCPU model only
supports one of the features. However, it's entirely legal for a CPU to
have only one of IBT or SHSTK, i.e. the hole is a flaw in the architecture,
not in KVM.
More importantly, so long as KVM is careful to initialize and context
switch both IBT and SHSTK state (when supported in hardware) if either
feature is exposed to the guest, a misbehaving guest can only harm itself.
E.g. VMX initializes host CET VMCS fields based solely on hardware
capabilities.
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: split to separate patch, write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/x86.h | 3 +++
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 554d83ff6135..39231da3a3ff 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -142,7 +142,7 @@
| X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
| X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
| X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
- | X86_CR4_LAM_SUP))
+ | X86_CR4_LAM_SUP | X86_CR4_CET))
#define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 65cbd454c4f1..f3dc77f006f9 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -680,6 +680,9 @@ static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
__reserved_bits |= X86_CR4_PCIDE; \
if (!__cpu_has(__c, X86_FEATURE_LAM)) \
__reserved_bits |= X86_CR4_LAM_SUP; \
+ if (!__cpu_has(__c, X86_FEATURE_SHSTK) && \
+ !__cpu_has(__c, X86_FEATURE_IBT)) \
+ __reserved_bits |= X86_CR4_CET; \
__reserved_bits; \
})
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 24/51] KVM: nVMX: Always forward XSAVES/XRSTORS exits from L2 to L1
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (22 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 23/51] KVM: x86: Allow setting CR4.CET if IBT or SHSTK is supported Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-23 8:15 ` Chao Gao
2025-09-23 14:49 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 25/51] KVM: x86: Add XSS support for CET_KERNEL and CET_USER Sean Christopherson
` (28 subsequent siblings)
52 siblings, 2 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Unconditionally forward XSAVES/XRSTORS VM-Exits from L2 to L1, as KVM
doesn't utilize the XSS-bitmap (KVM relies on controlling the XSS value
in hardware to prevent unauthorized access to XSAVES state). KVM always
loads vmcs02 with vmcs12's bitmap, and so any exit _must_ be due to
vmcs12's XSS-bitmap.
Drop the comment about XSS never being non-zero in anticipation of
enabling CET_KERNEL and CET_USER support.
Opportunistically WARN if XSAVES is not enabled for L2, as the CPU is
supposed to generate #UD before checking the XSS-bitmap.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/nested.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2156c9a854f4..846c07380eac 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6570,14 +6570,17 @@ static bool nested_vmx_l1_wants_exit(struct kvm_vcpu *vcpu,
return nested_cpu_has2(vmcs12, SECONDARY_EXEC_WBINVD_EXITING);
case EXIT_REASON_XSETBV:
return true;
- case EXIT_REASON_XSAVES: case EXIT_REASON_XRSTORS:
+ case EXIT_REASON_XSAVES:
+ case EXIT_REASON_XRSTORS:
/*
- * This should never happen, since it is not possible to
- * set XSS to a non-zero value---neither in L1 nor in L2.
- * If if it were, XSS would have to be checked against
- * the XSS exit bitmap in vmcs12.
+ * Always forward XSAVES/XRSTORS to L1 as KVM doesn't utilize
+ * XSS-bitmap, and always loads vmcs02 with vmcs12's XSS-bitmap
+ * verbatim, i.e. any exit is due to L1's bitmap. WARN if
+ * XSAVES isn't enabled, as the CPU is supposed to inject #UD
+ * in that case, before consulting the XSS-bitmap.
*/
- return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_XSAVES);
+ WARN_ON_ONCE(!nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_XSAVES));
+ return true;
case EXIT_REASON_UMWAIT:
case EXIT_REASON_TPAUSE:
return nested_cpu_has2(vmcs12,
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 25/51] KVM: x86: Add XSS support for CET_KERNEL and CET_USER
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (23 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 24/51] KVM: nVMX: Always forward XSAVES/XRSTORS exits from L2 to L1 Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 7:31 ` Binbin Wu
2025-09-23 14:55 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 26/51] KVM: x86: Disable support for Shadow Stacks if TDP is disabled Sean Christopherson
` (27 subsequent siblings)
52 siblings, 2 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Add CET_KERNEL and CET_USER to KVM's set of supported XSS bits when IBT
*or* SHSTK is supported. Like CR4.CET, XFEATURE support for IBT and SHSTK
are bundle together under the CET umbrella, and thus prone to
virtualization holes if KVM or the guest supports only one of IBT or SHSTK,
but hardware supports both. However, again like CR4.CET, such
virtualization holes are benign from the host's perspective so long as KVM
takes care to always honor the "or" logic.
Require CET_KERNEL and CET_USER to come as a pair, and refuse to support
IBT or SHSTK if one (or both) features is missing, as the (host) kernel
expects them to come as a pair, i.e. may get confused and corrupt state if
only one of CET_KERNEL or CET_USER is supported.
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: split to separate patch, write changelog, add XFEATURE_MASK_CET_ALL]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/x86.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 40596fc5142e..4a0ff0403bb2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -220,13 +220,14 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
+#define XFEATURE_MASK_CET_ALL (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL)
/*
* Note, KVM supports exposing PT to the guest, but does not support context
* switching PT via XSTATE (KVM's PT virtualization relies on perf; swapping
* PT via guest XSTATE would clobber perf state), i.e. KVM doesn't support
* IA32_XSS[bit 8] (guests can/must use RDMSR/WRMSR to save/restore PT MSRs).
*/
-#define KVM_SUPPORTED_XSS 0
+#define KVM_SUPPORTED_XSS (XFEATURE_MASK_CET_ALL)
bool __read_mostly allow_smaller_maxphyaddr = 0;
EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);
@@ -10104,6 +10105,16 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
kvm_caps.supported_xss = 0;
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+ !kvm_cpu_cap_has(X86_FEATURE_IBT))
+ kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_ALL;
+
+ if ((kvm_caps.supported_xss & XFEATURE_MASK_CET_ALL) != XFEATURE_MASK_CET_ALL) {
+ kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+ kvm_cpu_cap_clear(X86_FEATURE_IBT);
+ kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_ALL;
+ }
+
if (kvm_caps.has_tsc_control) {
/*
* Make sure the user can only configure tsc_khz values that
@@ -12775,10 +12786,11 @@ static void kvm_xstate_reset(struct kvm_vcpu *vcpu, bool init_event)
/*
* On INIT, only select XSTATE components are zeroed, most components
* are unchanged. Currently, the only components that are zeroed and
- * supported by KVM are MPX related.
+ * supported by KVM are MPX and CET related.
*/
xfeatures_mask = (kvm_caps.supported_xcr0 | kvm_caps.supported_xss) &
- (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
+ (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR |
+ XFEATURE_MASK_CET_ALL);
if (!xfeatures_mask)
return;
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 26/51] KVM: x86: Disable support for Shadow Stacks if TDP is disabled
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (24 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 25/51] KVM: x86: Add XSS support for CET_KERNEL and CET_USER Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 7:45 ` Binbin Wu
2025-09-23 14:56 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 27/51] KVM: x86: Disable support for IBT and SHSTK if allow_smaller_maxphyaddr is true Sean Christopherson
` (26 subsequent siblings)
52 siblings, 2 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Make TDP a hard requirement for Shadow Stacks, as there are no plans to
add Shadow Stack support to the Shadow MMU. E.g. KVM hasn't been taught
to understand the magic Writable=0,Dirty=0 combination that is required
for Shadow Stack accesses, and so enabling Shadow Stacks when using
shadow paging will put the guest into an infinite #PF loop (KVM thinks the
shadow page tables have a valid mapping, hardware says otherwise).
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/cpuid.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 32fde9e80c28..499c86bd457e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -955,6 +955,14 @@ void kvm_set_cpu_caps(void)
if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
kvm_cpu_cap_clear(X86_FEATURE_PKU);
+ /*
+ * Shadow Stacks aren't implemented in the Shadow MMU. Shadow Stack
+ * accesses require "magic" Writable=0,Dirty=1 protection, which KVM
+ * doesn't know how to emulate or map.
+ */
+ if (!tdp_enabled)
+ kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+
kvm_cpu_cap_init(CPUID_7_EDX,
F(AVX512_4VNNIW),
F(AVX512_4FMAPS),
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 27/51] KVM: x86: Disable support for IBT and SHSTK if allow_smaller_maxphyaddr is true
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (25 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 26/51] KVM: x86: Disable support for Shadow Stacks if TDP is disabled Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 8:00 ` Binbin Wu
` (2 more replies)
2025-09-19 22:32 ` [PATCH v16 28/51] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Sean Christopherson
` (25 subsequent siblings)
52 siblings, 3 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Make IBT and SHSTK virtualization mutually exclusive with "officially"
supporting setups with guest.MAXPHYADDR < host.MAXPHYADDR, i.e. if the
allow_smaller_maxphyaddr module param is set. Running a guest with a
smaller MAXPHYADDR requires intercepting #PF, and can also trigger
emulation of arbitrary instructions. Intercepting and reacting to #PFs
doesn't play nice with SHSTK, as KVM's MMU hasn't been taught to handle
Shadow Stack accesses, and emulating arbitrary instructions doesn't play
nice with IBT or SHSTK, as KVM's emulator doesn't handle the various side
effects, e.g. doesn't enforce end-branch markers or model Shadow Stack
updates.
Note, hiding IBT and SHSTK based solely on allow_smaller_maxphyaddr is
overkill, as allow_smaller_maxphyaddr is only problematic if the guest is
actually configured to have a smaller MAXPHYADDR. However, KVM's ABI
doesn't provide a way to express that IBT and SHSTK may break if enabled
in conjunction with guest.MAXPHYADDR < host.MAXPHYADDR. I.e. the
alternative is to do nothing in KVM and instead update documentation and
hope KVM users are thorough readers. Go with the conservative-but-correct
approach; worst case scenario, this restriction can be dropped if there's
a strong use case for enabling CET on hosts with allow_smaller_maxphyaddr.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/cpuid.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 499c86bd457e..b5c4cb13630c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -963,6 +963,16 @@ void kvm_set_cpu_caps(void)
if (!tdp_enabled)
kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+ /*
+ * Disable support for IBT and SHSTK if KVM is configured to emulate
+ * accesses to reserved GPAs, as KVM's emulator doesn't support IBT or
+ * SHSTK, nor does KVM handle Shadow Stack #PFs (see above).
+ */
+ if (allow_smaller_maxphyaddr) {
+ kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+ kvm_cpu_cap_clear(X86_FEATURE_IBT);
+ }
+
kvm_cpu_cap_init(CPUID_7_EDX,
F(AVX512_4VNNIW),
F(AVX512_4FMAPS),
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 28/51] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (26 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 27/51] KVM: x86: Disable support for IBT and SHSTK if allow_smaller_maxphyaddr is true Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 8:06 ` Binbin Wu
2025-09-23 14:57 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 29/51] KVM: VMX: Configure nested capabilities after CPU capabilities Sean Christopherson
` (24 subsequent siblings)
52 siblings, 2 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Add support for the LOAD_CET_STATE VM-Enter and VM-Exit controls, the
CET XFEATURE bits in XSS, and advertise support for IBT and SHSTK to
userspace. Explicitly clear IBT and SHSTK onn SVM, as additional work is
needed to enable CET on SVM, e.g. to context switch S_CET and other state.
Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
KVM does not support emulating CET, as running without Unrestricted Guest
can result in KVM emulating large swaths of guest code. While it's highly
unlikely any guest will trigger emulation while also utilizing IBT or
SHSTK, there's zero reason to allow CET without Unrestricted Guest as that
combination should only be possible when explicitly disabling
unrestricted_guest for testing purposes.
Disable CET if VMX_BASIC[bit56] == 0, i.e. if hardware strictly enforces
the presence of an Error Code based on exception vector, as attempting to
inject a #CP with an Error Code (#CP architecturally has an Error Code)
will fail due to the #CP vector historically not having an Error Code.
Clear S_CET and SSP-related VMCS on "reset" to emulate the architectural
of CET MSRs and SSP being reset to 0 after RESET, power-up and INIT. Note,
KVM already clears guest CET state that is managed via XSTATE in
kvm_xstate_reset().
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
[sean: move some bits to separate patches, massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/vmx.h | 1 +
arch/x86/kvm/cpuid.c | 2 ++
arch/x86/kvm/svm/svm.c | 4 ++++
arch/x86/kvm/vmx/capabilities.h | 5 +++++
arch/x86/kvm/vmx/vmx.c | 30 +++++++++++++++++++++++++++++-
arch/x86/kvm/vmx/vmx.h | 6 ++++--
6 files changed, 45 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index ce10a7e2d3d9..c85c50019523 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -134,6 +134,7 @@
#define VMX_BASIC_DUAL_MONITOR_TREATMENT BIT_ULL(49)
#define VMX_BASIC_INOUT BIT_ULL(54)
#define VMX_BASIC_TRUE_CTLS BIT_ULL(55)
+#define VMX_BASIC_NO_HW_ERROR_CODE_CC BIT_ULL(56)
static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic)
{
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b5c4cb13630c..b861a88083e1 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -946,6 +946,7 @@ void kvm_set_cpu_caps(void)
VENDOR_F(WAITPKG),
F(SGX_LC),
F(BUS_LOCK_DETECT),
+ X86_64_F(SHSTK),
);
/*
@@ -990,6 +991,7 @@ void kvm_set_cpu_caps(void)
F(AMX_INT8),
F(AMX_BF16),
F(FLUSH_L1D),
+ F(IBT),
);
if (boot_cpu_has(X86_FEATURE_AMD_IBPB_RET) &&
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 67f4eed01526..73dde1645e46 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5221,6 +5221,10 @@ static __init void svm_set_cpu_caps(void)
kvm_caps.supported_perf_cap = 0;
kvm_caps.supported_xss = 0;
+ /* KVM doesn't yet support CET virtualization for SVM. */
+ kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+ kvm_cpu_cap_clear(X86_FEATURE_IBT);
+
/* CPUID 0x80000001 and 0x8000000A (SVM features) */
if (nested) {
kvm_cpu_cap_set(X86_FEATURE_SVM);
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 59c83888bdc0..02aadb9d730e 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -73,6 +73,11 @@ static inline bool cpu_has_vmx_basic_inout(void)
return vmcs_config.basic & VMX_BASIC_INOUT;
}
+static inline bool cpu_has_vmx_basic_no_hw_errcode_cc(void)
+{
+ return vmcs_config.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
+}
+
static inline bool cpu_has_virtual_nmis(void)
{
return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a7d9e60b2771..69e35440cee7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2615,6 +2615,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
{ VM_ENTRY_LOAD_IA32_EFER, VM_EXIT_LOAD_IA32_EFER },
{ VM_ENTRY_LOAD_BNDCFGS, VM_EXIT_CLEAR_BNDCFGS },
{ VM_ENTRY_LOAD_IA32_RTIT_CTL, VM_EXIT_CLEAR_IA32_RTIT_CTL },
+ { VM_ENTRY_LOAD_CET_STATE, VM_EXIT_LOAD_CET_STATE },
};
memset(vmcs_conf, 0, sizeof(*vmcs_conf));
@@ -4881,6 +4882,14 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); /* 22.2.1 */
+ if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+ vmcs_writel(GUEST_SSP, 0);
+ vmcs_writel(GUEST_INTR_SSP_TABLE, 0);
+ }
+ if (kvm_cpu_cap_has(X86_FEATURE_IBT) ||
+ kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+ vmcs_writel(GUEST_S_CET, 0);
+
kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
vpid_sync_context(vmx->vpid);
@@ -6348,6 +6357,10 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
if (vmcs_read32(VM_EXIT_MSR_STORE_COUNT) > 0)
vmx_dump_msrs("guest autostore", &vmx->msr_autostore.guest);
+ if (vmentry_ctl & VM_ENTRY_LOAD_CET_STATE)
+ pr_err("S_CET = 0x%016lx, SSP = 0x%016lx, SSP TABLE = 0x%016lx\n",
+ vmcs_readl(GUEST_S_CET), vmcs_readl(GUEST_SSP),
+ vmcs_readl(GUEST_INTR_SSP_TABLE));
pr_err("*** Host State ***\n");
pr_err("RIP = 0x%016lx RSP = 0x%016lx\n",
vmcs_readl(HOST_RIP), vmcs_readl(HOST_RSP));
@@ -6378,6 +6391,10 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
vmcs_read64(HOST_IA32_PERF_GLOBAL_CTRL));
if (vmcs_read32(VM_EXIT_MSR_LOAD_COUNT) > 0)
vmx_dump_msrs("host autoload", &vmx->msr_autoload.host);
+ if (vmexit_ctl & VM_EXIT_LOAD_CET_STATE)
+ pr_err("S_CET = 0x%016lx, SSP = 0x%016lx, SSP TABLE = 0x%016lx\n",
+ vmcs_readl(HOST_S_CET), vmcs_readl(HOST_SSP),
+ vmcs_readl(HOST_INTR_SSP_TABLE));
pr_err("*** Control State ***\n");
pr_err("CPUBased=0x%08x SecondaryExec=0x%08x TertiaryExec=0x%016llx\n",
@@ -7959,7 +7976,6 @@ static __init void vmx_set_cpu_caps(void)
kvm_cpu_cap_set(X86_FEATURE_UMIP);
/* CPUID 0xD.1 */
- kvm_caps.supported_xss = 0;
if (!cpu_has_vmx_xsaves())
kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
@@ -7971,6 +7987,18 @@ static __init void vmx_set_cpu_caps(void)
if (cpu_has_vmx_waitpkg())
kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);
+
+ /*
+ * Disable CET if unrestricted_guest is unsupported as KVM doesn't
+ * enforce CET HW behaviors in emulator. On platforms with
+ * VMX_BASIC[bit56] == 0, inject #CP at VMX entry with error code
+ * fails, so disable CET in this case too.
+ */
+ if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest ||
+ !cpu_has_vmx_basic_no_hw_errcode_cc()) {
+ kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+ kvm_cpu_cap_clear(X86_FEATURE_IBT);
+ }
}
static bool vmx_is_io_intercepted(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 23d6e89b96f2..af8224e074ee 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -484,7 +484,8 @@ static inline u8 vmx_get_rvi(void)
VM_ENTRY_LOAD_IA32_EFER | \
VM_ENTRY_LOAD_BNDCFGS | \
VM_ENTRY_PT_CONCEAL_PIP | \
- VM_ENTRY_LOAD_IA32_RTIT_CTL)
+ VM_ENTRY_LOAD_IA32_RTIT_CTL | \
+ VM_ENTRY_LOAD_CET_STATE)
#define __KVM_REQUIRED_VMX_VM_EXIT_CONTROLS \
(VM_EXIT_SAVE_DEBUG_CONTROLS | \
@@ -506,7 +507,8 @@ static inline u8 vmx_get_rvi(void)
VM_EXIT_LOAD_IA32_EFER | \
VM_EXIT_CLEAR_BNDCFGS | \
VM_EXIT_PT_CONCEAL_PIP | \
- VM_EXIT_CLEAR_IA32_RTIT_CTL)
+ VM_EXIT_CLEAR_IA32_RTIT_CTL | \
+ VM_EXIT_LOAD_CET_STATE)
#define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL \
(PIN_BASED_EXT_INTR_MASK | \
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 29/51] KVM: VMX: Configure nested capabilities after CPU capabilities
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (27 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 28/51] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-23 2:37 ` Chao Gao
2025-09-19 22:32 ` [PATCH v16 30/51] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2 Sean Christopherson
` (23 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Swap the order between configuring nested VMX capabilities and base CPU
capabilities, so that nested VMX support can be conditioned on core KVM
support, e.g. to allow conditioning support for LOAD_CET_STATE on the
presence of IBT or SHSTK. Because the sanity checks on nested VMX config
performed by vmx_check_processor_compat() run _after_ vmx_hardware_setup(),
any use of kvm_cpu_cap_has() when configuring nested VMX support will lead
to failures in vmx_check_processor_compat().
While swapping the order of two (or more) configuration flows can lead to
a game of whack-a-mole, in this case nested support inarguably should be
done after base support. KVM should never condition base support on nested
support, because nested support is fully optional, while obviously it's
desirable to condition nested support on base support. And there's zero
evidence the current ordering was intentional, e.g. commit 66a6950f9995
("KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking")
likely placed the call to kvm_set_cpu_caps() after nested setup because it
looked pretty.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/vmx.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 69e35440cee7..29e1bc118479 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8602,6 +8602,13 @@ __init int vmx_hardware_setup(void)
setup_default_sgx_lepubkeyhash();
+ vmx_set_cpu_caps();
+
+ /*
+ * Configure nested capabilities after core CPU capabilities so that
+ * nested support can be conditional on base support, e.g. so that KVM
+ * can hide/show features based on kvm_cpu_cap_has().
+ */
if (nested) {
nested_vmx_setup_ctls_msrs(&vmcs_config, vmx_capability.ept);
@@ -8610,8 +8617,6 @@ __init int vmx_hardware_setup(void)
return r;
}
- vmx_set_cpu_caps();
-
r = alloc_kvm_area();
if (r && nested)
nested_vmx_hardware_unsetup();
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 30/51] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (28 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 29/51] KVM: VMX: Configure nested capabilities after CPU capabilities Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 8:37 ` Binbin Wu
2025-09-19 22:32 ` [PATCH v16 31/51] KVM: nVMX: Prepare for enabling CET support for nested guest Sean Christopherson
` (22 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Per SDM description(Vol.3D, Appendix A.1):
"If bit 56 is read as 1, software can use VM entry to deliver a hardware
exception with or without an error code, regardless of vector"
Modify has_error_code check before inject events to nested guest. Only
enforce the check when guest is in real mode, the exception is not hard
exception and the platform doesn't enumerate bit56 in VMX_BASIC, in all
other case ignore the check to make the logic consistent with SDM.
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/nested.c | 27 ++++++++++++++++++---------
arch/x86/kvm/vmx/nested.h | 5 +++++
2 files changed, 23 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 846c07380eac..b644f4599f70 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1272,9 +1272,10 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
{
const u64 feature_bits = VMX_BASIC_DUAL_MONITOR_TREATMENT |
VMX_BASIC_INOUT |
- VMX_BASIC_TRUE_CTLS;
+ VMX_BASIC_TRUE_CTLS |
+ VMX_BASIC_NO_HW_ERROR_CODE_CC;
- const u64 reserved_bits = GENMASK_ULL(63, 56) |
+ const u64 reserved_bits = GENMASK_ULL(63, 57) |
GENMASK_ULL(47, 45) |
BIT_ULL(31);
@@ -2949,7 +2950,6 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
u8 vector = intr_info & INTR_INFO_VECTOR_MASK;
u32 intr_type = intr_info & INTR_INFO_INTR_TYPE_MASK;
bool has_error_code = intr_info & INTR_INFO_DELIVER_CODE_MASK;
- bool should_have_error_code;
bool urg = nested_cpu_has2(vmcs12,
SECONDARY_EXEC_UNRESTRICTED_GUEST);
bool prot_mode = !urg || vmcs12->guest_cr0 & X86_CR0_PE;
@@ -2966,12 +2966,19 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
CC(intr_type == INTR_TYPE_OTHER_EVENT && vector != 0))
return -EINVAL;
- /* VM-entry interruption-info field: deliver error code */
- should_have_error_code =
- intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
- x86_exception_has_error_code(vector);
- if (CC(has_error_code != should_have_error_code))
- return -EINVAL;
+ /*
+ * Cannot deliver error code in real mode or if the interrupt
+ * type is not hardware exception. For other cases, do the
+ * consistency check only if the vCPU doesn't enumerate
+ * VMX_BASIC_NO_HW_ERROR_CODE_CC.
+ */
+ if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION) {
+ if (CC(has_error_code))
+ return -EINVAL;
+ } else if (!nested_cpu_has_no_hw_errcode_cc(vcpu)) {
+ if (CC(has_error_code != x86_exception_has_error_code(vector)))
+ return -EINVAL;
+ }
/* VM-entry exception error code */
if (CC(has_error_code &&
@@ -7217,6 +7224,8 @@ static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs)
msrs->basic |= VMX_BASIC_TRUE_CTLS;
if (cpu_has_vmx_basic_inout())
msrs->basic |= VMX_BASIC_INOUT;
+ if (cpu_has_vmx_basic_no_hw_errcode_cc())
+ msrs->basic |= VMX_BASIC_NO_HW_ERROR_CODE_CC;
}
static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index 6eedcfc91070..983484d42ebf 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -309,6 +309,11 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
__kvm_is_valid_cr4(vcpu, val);
}
+static inline bool nested_cpu_has_no_hw_errcode_cc(struct kvm_vcpu *vcpu)
+{
+ return to_vmx(vcpu)->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
+}
+
/* No difference in the restrictions on guest and host CR4 in VMX operation. */
#define nested_guest_cr4_valid nested_cr4_valid
#define nested_host_cr4_valid nested_cr4_valid
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 31/51] KVM: nVMX: Prepare for enabling CET support for nested guest
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (29 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 30/51] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2 Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 32/51] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET Sean Christopherson
` (21 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Yang Weijiang <weijiang.yang@intel.com>
Set up CET MSRs, related VM_ENTRY/EXIT control bits and fixed CR4 setting
to enable CET for nested VM.
vmcs12 and vmcs02 needs to be synced when L2 exits to L1 or when L1 wants
to resume L2, that way correct CET states can be observed by one another.
Please note that consistency checks regarding CET state during VM-Entry
will be added later to prevent this patch from becoming too large.
Advertising the new CET VM_ENTRY/EXIT control bits are also be deferred
until after the consistency checks are added.
Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Xin Li (Intel) <xin@zytor.com>
Tested-by: Xin Li (Intel) <xin@zytor.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/nested.c | 77 +++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/vmcs12.c | 6 +++
arch/x86/kvm/vmx/vmcs12.h | 14 ++++++-
arch/x86/kvm/vmx/vmx.c | 2 +
arch/x86/kvm/vmx/vmx.h | 3 ++
5 files changed, 101 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b644f4599f70..11e5d3569933 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -721,6 +721,24 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
MSR_IA32_MPERF, MSR_TYPE_R);
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_U_CET, MSR_TYPE_RW);
+
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_S_CET, MSR_TYPE_RW);
+
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_PL0_SSP, MSR_TYPE_RW);
+
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_PL1_SSP, MSR_TYPE_RW);
+
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_PL2_SSP, MSR_TYPE_RW);
+
+ nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+ MSR_IA32_PL3_SSP, MSR_TYPE_RW);
+
kvm_vcpu_unmap(vcpu, &map);
vmx->nested.force_msr_bitmap_recalc = false;
@@ -2521,6 +2539,32 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
}
}
+static void vmcs_read_cet_state(struct kvm_vcpu *vcpu, u64 *s_cet,
+ u64 *ssp, u64 *ssp_tbl)
+{
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) ||
+ guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+ *s_cet = vmcs_readl(GUEST_S_CET);
+
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
+ *ssp = vmcs_readl(GUEST_SSP);
+ *ssp_tbl = vmcs_readl(GUEST_INTR_SSP_TABLE);
+ }
+}
+
+static void vmcs_write_cet_state(struct kvm_vcpu *vcpu, u64 s_cet,
+ u64 ssp, u64 ssp_tbl)
+{
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) ||
+ guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK))
+ vmcs_writel(GUEST_S_CET, s_cet);
+
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
+ vmcs_writel(GUEST_SSP, ssp);
+ vmcs_writel(GUEST_INTR_SSP_TABLE, ssp_tbl);
+ }
+}
+
static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
{
struct hv_enlightened_vmcs *hv_evmcs = nested_vmx_evmcs(vmx);
@@ -2637,6 +2681,10 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
+ if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE)
+ vmcs_write_cet_state(&vmx->vcpu, vmcs12->guest_s_cet,
+ vmcs12->guest_ssp, vmcs12->guest_ssp_tbl);
+
set_cr4_guest_host_mask(vmx);
}
@@ -2676,6 +2724,13 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmx_guest_debugctl_write(vcpu, vmx->nested.pre_vmenter_debugctl);
}
+
+ if (!vmx->nested.nested_run_pending ||
+ !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE))
+ vmcs_write_cet_state(vcpu, vmx->nested.pre_vmenter_s_cet,
+ vmx->nested.pre_vmenter_ssp,
+ vmx->nested.pre_vmenter_ssp_tbl);
+
if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending ||
!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
vmcs_write64(GUEST_BNDCFGS, vmx->nested.pre_vmenter_bndcfgs);
@@ -3551,6 +3606,12 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
vmx->nested.pre_vmenter_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
+ if (!vmx->nested.nested_run_pending ||
+ !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE))
+ vmcs_read_cet_state(vcpu, &vmx->nested.pre_vmenter_s_cet,
+ &vmx->nested.pre_vmenter_ssp,
+ &vmx->nested.pre_vmenter_ssp_tbl);
+
/*
* Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and*
* nested early checks are disabled. In the event of a "late" VM-Fail,
@@ -4634,6 +4695,10 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_IA32_EFER)
vmcs12->guest_ia32_efer = vcpu->arch.efer;
+
+ vmcs_read_cet_state(&vmx->vcpu, &vmcs12->guest_s_cet,
+ &vmcs12->guest_ssp,
+ &vmcs12->guest_ssp_tbl);
}
/*
@@ -4759,6 +4824,18 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
if (vmcs12->vm_exit_controls & VM_EXIT_CLEAR_BNDCFGS)
vmcs_write64(GUEST_BNDCFGS, 0);
+ /*
+ * Load CET state from host state if VM_EXIT_LOAD_CET_STATE is set.
+ * otherwise CET state should be retained across VM-exit, i.e.,
+ * guest values should be propagated from vmcs12 to vmcs01.
+ */
+ if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_CET_STATE)
+ vmcs_write_cet_state(vcpu, vmcs12->host_s_cet, vmcs12->host_ssp,
+ vmcs12->host_ssp_tbl);
+ else
+ vmcs_write_cet_state(vcpu, vmcs12->guest_s_cet, vmcs12->guest_ssp,
+ vmcs12->guest_ssp_tbl);
+
if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PAT) {
vmcs_write64(GUEST_IA32_PAT, vmcs12->host_ia32_pat);
vcpu->arch.pat = vmcs12->host_ia32_pat;
diff --git a/arch/x86/kvm/vmx/vmcs12.c b/arch/x86/kvm/vmx/vmcs12.c
index 106a72c923ca..4233b5ca9461 100644
--- a/arch/x86/kvm/vmx/vmcs12.c
+++ b/arch/x86/kvm/vmx/vmcs12.c
@@ -139,6 +139,9 @@ const unsigned short vmcs12_field_offsets[] = {
FIELD(GUEST_PENDING_DBG_EXCEPTIONS, guest_pending_dbg_exceptions),
FIELD(GUEST_SYSENTER_ESP, guest_sysenter_esp),
FIELD(GUEST_SYSENTER_EIP, guest_sysenter_eip),
+ FIELD(GUEST_S_CET, guest_s_cet),
+ FIELD(GUEST_SSP, guest_ssp),
+ FIELD(GUEST_INTR_SSP_TABLE, guest_ssp_tbl),
FIELD(HOST_CR0, host_cr0),
FIELD(HOST_CR3, host_cr3),
FIELD(HOST_CR4, host_cr4),
@@ -151,5 +154,8 @@ const unsigned short vmcs12_field_offsets[] = {
FIELD(HOST_IA32_SYSENTER_EIP, host_ia32_sysenter_eip),
FIELD(HOST_RSP, host_rsp),
FIELD(HOST_RIP, host_rip),
+ FIELD(HOST_S_CET, host_s_cet),
+ FIELD(HOST_SSP, host_ssp),
+ FIELD(HOST_INTR_SSP_TABLE, host_ssp_tbl),
};
const unsigned int nr_vmcs12_fields = ARRAY_SIZE(vmcs12_field_offsets);
diff --git a/arch/x86/kvm/vmx/vmcs12.h b/arch/x86/kvm/vmx/vmcs12.h
index 56fd150a6f24..4ad6b16525b9 100644
--- a/arch/x86/kvm/vmx/vmcs12.h
+++ b/arch/x86/kvm/vmx/vmcs12.h
@@ -117,7 +117,13 @@ struct __packed vmcs12 {
natural_width host_ia32_sysenter_eip;
natural_width host_rsp;
natural_width host_rip;
- natural_width paddingl[8]; /* room for future expansion */
+ natural_width host_s_cet;
+ natural_width host_ssp;
+ natural_width host_ssp_tbl;
+ natural_width guest_s_cet;
+ natural_width guest_ssp;
+ natural_width guest_ssp_tbl;
+ natural_width paddingl[2]; /* room for future expansion */
u32 pin_based_vm_exec_control;
u32 cpu_based_vm_exec_control;
u32 exception_bitmap;
@@ -294,6 +300,12 @@ static inline void vmx_check_vmcs12_offsets(void)
CHECK_OFFSET(host_ia32_sysenter_eip, 656);
CHECK_OFFSET(host_rsp, 664);
CHECK_OFFSET(host_rip, 672);
+ CHECK_OFFSET(host_s_cet, 680);
+ CHECK_OFFSET(host_ssp, 688);
+ CHECK_OFFSET(host_ssp_tbl, 696);
+ CHECK_OFFSET(guest_s_cet, 704);
+ CHECK_OFFSET(guest_ssp, 712);
+ CHECK_OFFSET(guest_ssp_tbl, 720);
CHECK_OFFSET(pin_based_vm_exec_control, 744);
CHECK_OFFSET(cpu_based_vm_exec_control, 748);
CHECK_OFFSET(exception_bitmap, 752);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 29e1bc118479..509487a1f04a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7748,6 +7748,8 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
cr4_fixed1_update(X86_CR4_PKE, ecx, feature_bit(PKU));
cr4_fixed1_update(X86_CR4_UMIP, ecx, feature_bit(UMIP));
cr4_fixed1_update(X86_CR4_LA57, ecx, feature_bit(LA57));
+ cr4_fixed1_update(X86_CR4_CET, ecx, feature_bit(SHSTK));
+ cr4_fixed1_update(X86_CR4_CET, edx, feature_bit(IBT));
entry = kvm_find_cpuid_entry_index(vcpu, 0x7, 1);
cr4_fixed1_update(X86_CR4_LAM_SUP, eax, feature_bit(LAM));
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index af8224e074ee..ea93121029f9 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -181,6 +181,9 @@ struct nested_vmx {
*/
u64 pre_vmenter_debugctl;
u64 pre_vmenter_bndcfgs;
+ u64 pre_vmenter_s_cet;
+ u64 pre_vmenter_ssp;
+ u64 pre_vmenter_ssp_tbl;
/* to migrate it to L1 if L2 writes to L1's CR8 directly */
int l1_tpr_threshold;
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 32/51] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (30 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 31/51] KVM: nVMX: Prepare for enabling CET support for nested guest Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 8:47 ` Binbin Wu
2025-09-19 22:32 ` [PATCH v16 33/51] KVM: nVMX: Add consistency checks for CET states Sean Christopherson
` (20 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Chao Gao <chao.gao@intel.com>
Add consistency checks for CR4.CET and CR0.WP in guest-state or host-state
area in the VMCS12. This ensures that configurations with CR4.CET set and
CR0.WP not set result in VM-entry failure, aligning with architectural
behavior.
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/nested.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 11e5d3569933..51c50ce9e011 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3110,6 +3110,9 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
CC(!kvm_vcpu_is_legal_cr3(vcpu, vmcs12->host_cr3)))
return -EINVAL;
+ if (CC(vmcs12->host_cr4 & X86_CR4_CET && !(vmcs12->host_cr0 & X86_CR0_WP)))
+ return -EINVAL;
+
if (CC(is_noncanonical_msr_address(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
CC(is_noncanonical_msr_address(vmcs12->host_ia32_sysenter_eip, vcpu)))
return -EINVAL;
@@ -3224,6 +3227,9 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
CC(!nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)))
return -EINVAL;
+ if (CC(vmcs12->guest_cr4 & X86_CR4_CET && !(vmcs12->guest_cr0 & X86_CR0_WP)))
+ return -EINVAL;
+
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
(CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 33/51] KVM: nVMX: Add consistency checks for CET states
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (31 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 32/51] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 9:23 ` Binbin Wu
2025-09-19 22:32 ` [PATCH v16 34/51] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state Sean Christopherson
` (19 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Chao Gao <chao.gao@intel.com>
Introduce consistency checks for CET states during nested VM-entry.
A VMCS contains both guest and host CET states, each comprising the
IA32_S_CET MSR, SSP, and IA32_INTERRUPT_SSP_TABLE_ADDR MSR. Various
checks are applied to CET states during VM-entry as documented in SDM
Vol3 Chapter "VM ENTRIES". Implement all these checks during nested
VM-entry to emulate the architectural behavior.
In summary, there are three kinds of checks on guest/host CET states
during VM-entry:
A. Checks applied to both guest states and host states:
* The IA32_S_CET field must not set any reserved bits; bits 10 (SUPPRESS)
and 11 (TRACKER) cannot both be set.
* SSP should not have bits 1:0 set.
* The IA32_INTERRUPT_SSP_TABLE_ADDR field must be canonical.
B. Checks applied to host states only
* IA32_S_CET MSR and SSP must be canonical if the CPU enters 64-bit mode
after VM-exit. Otherwise, IA32_S_CET and SSP must have their higher 32
bits cleared.
C. Checks applied to guest states only:
* IA32_S_CET MSR and SSP are not required to be canonical (i.e., 63:N-1
are identical, where N is the CPU's maximum linear-address width). But,
bits 63:N of SSP must be identical.
Tested-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: John Allen <john.allen@amd.com>
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/nested.c | 47 +++++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 51c50ce9e011..024bfb4d3a72 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3100,6 +3100,17 @@ static bool is_l1_noncanonical_address_on_vmexit(u64 la, struct vmcs12 *vmcs12)
return !__is_canonical_address(la, l1_address_bits_on_exit);
}
+static bool is_valid_cet_state(struct kvm_vcpu *vcpu, u64 s_cet, u64 ssp, u64 ssp_tbl)
+{
+ if (!kvm_is_valid_u_s_cet(vcpu, s_cet) || !IS_ALIGNED(ssp, 4))
+ return false;
+
+ if (is_noncanonical_msr_address(ssp_tbl, vcpu))
+ return false;
+
+ return true;
+}
+
static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12)
{
@@ -3169,6 +3180,26 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
return -EINVAL;
}
+ if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_CET_STATE) {
+ if (CC(!is_valid_cet_state(vcpu, vmcs12->host_s_cet, vmcs12->host_ssp,
+ vmcs12->host_ssp_tbl)))
+ return -EINVAL;
+
+ /*
+ * IA32_S_CET and SSP must be canonical if the host will
+ * enter 64-bit mode after VM-exit; otherwise, higher
+ * 32-bits must be all 0s.
+ */
+ if (ia32e) {
+ if (CC(is_noncanonical_msr_address(vmcs12->host_s_cet, vcpu)) ||
+ CC(is_noncanonical_msr_address(vmcs12->host_ssp, vcpu)))
+ return -EINVAL;
+ } else {
+ if (CC(vmcs12->host_s_cet >> 32) || CC(vmcs12->host_ssp >> 32))
+ return -EINVAL;
+ }
+ }
+
return 0;
}
@@ -3279,6 +3310,22 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
CC((vmcs12->guest_bndcfgs & MSR_IA32_BNDCFGS_RSVD))))
return -EINVAL;
+ if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE) {
+ if (CC(!is_valid_cet_state(vcpu, vmcs12->guest_s_cet, vmcs12->guest_ssp,
+ vmcs12->guest_ssp_tbl)))
+ return -EINVAL;
+
+ /*
+ * Guest SSP must have 63:N bits identical, rather than
+ * be canonical (i.e., 63:N-1 bits identical), where N is
+ * the CPU's maximum linear-address width. Similar to
+ * is_noncanonical_msr_address(), use the host's
+ * linear-address width.
+ */
+ if (CC(!__is_canonical_address(vmcs12->guest_ssp, max_host_virt_addr_bits() + 1)))
+ return -EINVAL;
+ }
+
if (nested_check_guest_non_reg_state(vmcs12))
return -EINVAL;
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 34/51] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (32 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 33/51] KVM: nVMX: Add consistency checks for CET states Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-23 2:43 ` Chao Gao
2025-09-19 22:32 ` [PATCH v16 35/51] KVM: SVM: Emulate reads and writes to shadow stack MSRs Sean Christopherson
` (18 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Chao Gao <chao.gao@intel.com>
Advertise the LOAD_CET_STATE VM-Entry/Exit control bits in the nested VMX
MSRS, as all nested support for CET virtualization, including consistency
checks, is in place.
Advertise support if and only if KVM supports at least one of IBT or SHSTK.
While it's userspace's responsibility to provide a consistent CPU model to
the guest, that doesn't mean KVM should set userspace up to fail.
Note, the existing {CLEAR,LOAD}_BNDCFGS behavior predates
KVM_X86_QUIRK_STUFF_FEATURE_MSRS, i.e. KVM "solved" the inconsistent CPU
model problem by overwriting the VMX MSRs provided by userspace.
Signed-off-by: Chao Gao <chao.gao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/nested.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 024bfb4d3a72..a8a421a8e766 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -7178,13 +7178,17 @@ static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf,
VM_EXIT_HOST_ADDR_SPACE_SIZE |
#endif
VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
- VM_EXIT_CLEAR_BNDCFGS;
+ VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_CET_STATE;
msrs->exit_ctls_high |=
VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER | VM_EXIT_ACK_INTR_ON_EXIT |
VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+ !kvm_cpu_cap_has(X86_FEATURE_IBT))
+ msrs->exit_ctls_high &= ~VM_EXIT_LOAD_CET_STATE;
+
/* We support free control of debug control saving. */
msrs->exit_ctls_low &= ~VM_EXIT_SAVE_DEBUG_CONTROLS;
}
@@ -7200,11 +7204,16 @@ static void nested_vmx_setup_entry_ctls(struct vmcs_config *vmcs_conf,
#ifdef CONFIG_X86_64
VM_ENTRY_IA32E_MODE |
#endif
- VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS;
+ VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
+ VM_ENTRY_LOAD_CET_STATE;
msrs->entry_ctls_high |=
(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER |
VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
+ !kvm_cpu_cap_has(X86_FEATURE_IBT))
+ msrs->exit_ctls_high &= ~VM_ENTRY_LOAD_CET_STATE;
+
/* We support free control of debug control loading. */
msrs->entry_ctls_low &= ~VM_ENTRY_LOAD_DEBUG_CONTROLS;
}
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 35/51] KVM: SVM: Emulate reads and writes to shadow stack MSRs
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (33 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 34/51] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 36/51] KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02 Sean Christopherson
` (17 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: John Allen <john.allen@amd.com>
Emulate shadow stack MSR access by reading and writing to the
corresponding fields in the VMCB.
Signed-off-by: John Allen <john.allen@amd.com>
[sean: mark VMCB_CET dirty/clean as appropriate]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/svm.c | 21 +++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 3 ++-
2 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 73dde1645e46..52d2241d8188 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2767,6 +2767,15 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (guest_cpuid_is_intel_compatible(vcpu))
msr_info->data |= (u64)svm->sysenter_esp_hi << 32;
break;
+ case MSR_IA32_S_CET:
+ msr_info->data = svm->vmcb->save.s_cet;
+ break;
+ case MSR_IA32_INT_SSP_TAB:
+ msr_info->data = svm->vmcb->save.isst_addr;
+ break;
+ case MSR_KVM_INTERNAL_GUEST_SSP:
+ msr_info->data = svm->vmcb->save.ssp;
+ break;
case MSR_TSC_AUX:
msr_info->data = svm->tsc_aux;
break;
@@ -2999,6 +3008,18 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
svm->vmcb01.ptr->save.sysenter_esp = (u32)data;
svm->sysenter_esp_hi = guest_cpuid_is_intel_compatible(vcpu) ? (data >> 32) : 0;
break;
+ case MSR_IA32_S_CET:
+ svm->vmcb->save.s_cet = data;
+ vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_CET);
+ break;
+ case MSR_IA32_INT_SSP_TAB:
+ svm->vmcb->save.isst_addr = data;
+ vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_CET);
+ break;
+ case MSR_KVM_INTERNAL_GUEST_SSP:
+ svm->vmcb->save.ssp = data;
+ vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_CET);
+ break;
case MSR_TSC_AUX:
/*
* TSC_AUX is always virtualized for SEV-ES guests when the
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5365984e82e5..e072f91045b5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -74,6 +74,7 @@ enum {
* AVIC PHYSICAL_TABLE pointer,
* AVIC LOGICAL_TABLE pointer
*/
+ VMCB_CET, /* S_CET, SSP, ISST_ADDR */
VMCB_SW = 31, /* Reserved for hypervisor/software use */
};
@@ -82,7 +83,7 @@ enum {
(1U << VMCB_ASID) | (1U << VMCB_INTR) | \
(1U << VMCB_NPT) | (1U << VMCB_CR) | (1U << VMCB_DR) | \
(1U << VMCB_DT) | (1U << VMCB_SEG) | (1U << VMCB_CR2) | \
- (1U << VMCB_LBR) | (1U << VMCB_AVIC) | \
+ (1U << VMCB_LBR) | (1U << VMCB_AVIC) | (1U << VMCB_CET) | \
(1U << VMCB_SW))
/* TPR and CR2 are always written before VMRUN */
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 36/51] KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (34 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 35/51] KVM: SVM: Emulate reads and writes to shadow stack MSRs Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-10-28 22:23 ` Yosry Ahmed
2025-09-19 22:32 ` [PATCH v16 37/51] KVM: SVM: Update dump_vmcb with shadow stack save area additions Sean Christopherson
` (16 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Transfer the three CET Shadow Stack VMCB fields (S_CET, ISST_ADDR, and
SSP) on VMRUN, #VMEXIT, and loading nested state (saving nested state
simply copies the entire save area). SVM doesn't provide a way to
disallow L1 from enabling Shadow Stacks for L2, i.e. KVM *must* provide
nested support before advertising SHSTK to userspace.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/nested.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 826473f2d7c7..a6443feab252 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -636,6 +636,14 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
vmcb_mark_dirty(vmcb02, VMCB_DT);
}
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
+ (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_CET)))) {
+ vmcb02->save.s_cet = vmcb12->save.s_cet;
+ vmcb02->save.isst_addr = vmcb12->save.isst_addr;
+ vmcb02->save.ssp = vmcb12->save.ssp;
+ vmcb_mark_dirty(vmcb02, VMCB_CET);
+ }
+
kvm_set_rflags(vcpu, vmcb12->save.rflags | X86_EFLAGS_FIXED);
svm_set_efer(vcpu, svm->nested.save.efer);
@@ -1044,6 +1052,12 @@ void svm_copy_vmrun_state(struct vmcb_save_area *to_save,
to_save->rsp = from_save->rsp;
to_save->rip = from_save->rip;
to_save->cpl = 0;
+
+ if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+ to_save->s_cet = from_save->s_cet;
+ to_save->isst_addr = from_save->isst_addr;
+ to_save->ssp = from_save->ssp;
+ }
}
void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
@@ -1111,6 +1125,12 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
vmcb12->save.dr6 = svm->vcpu.arch.dr6;
vmcb12->save.cpl = vmcb02->save.cpl;
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
+ vmcb12->save.s_cet = vmcb02->save.s_cet;
+ vmcb12->save.isst_addr = vmcb02->save.isst_addr;
+ vmcb12->save.ssp = vmcb02->save.ssp;
+ }
+
vmcb12->control.int_state = vmcb02->control.int_state;
vmcb12->control.exit_code = vmcb02->control.exit_code;
vmcb12->control.exit_code_hi = vmcb02->control.exit_code_hi;
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 37/51] KVM: SVM: Update dump_vmcb with shadow stack save area additions
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (35 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 36/51] KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02 Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 38/51] KVM: SVM: Pass through shadow stack MSRs as appropriate Sean Christopherson
` (15 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: John Allen <john.allen@amd.com>
Add shadow stack VMCB fields to dump_vmcb. PL0_SSP, PL1_SSP, PL2_SSP,
PL3_SSP, and U_CET are part of the SEV-ES save area and are encrypted,
but can be decrypted and dumped if the guest policy allows debugging.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/svm.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 52d2241d8188..e50e6847fe72 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3410,6 +3410,10 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
"rip:", save->rip, "rflags:", save->rflags);
pr_err("%-15s %016llx %-13s %016llx\n",
"rsp:", save->rsp, "rax:", save->rax);
+ pr_err("%-15s %016llx %-13s %016llx\n",
+ "s_cet:", save->s_cet, "ssp:", save->ssp);
+ pr_err("%-15s %016llx\n",
+ "isst_addr:", save->isst_addr);
pr_err("%-15s %016llx %-13s %016llx\n",
"star:", save01->star, "lstar:", save01->lstar);
pr_err("%-15s %016llx %-13s %016llx\n",
@@ -3434,6 +3438,13 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
pr_err("%-15s %016llx\n",
"sev_features", vmsa->sev_features);
+ pr_err("%-15s %016llx %-13s %016llx\n",
+ "pl0_ssp:", vmsa->pl0_ssp, "pl1_ssp:", vmsa->pl1_ssp);
+ pr_err("%-15s %016llx %-13s %016llx\n",
+ "pl2_ssp:", vmsa->pl2_ssp, "pl3_ssp:", vmsa->pl3_ssp);
+ pr_err("%-15s %016llx\n",
+ "u_cet:", vmsa->u_cet);
+
pr_err("%-15s %016llx %-13s %016llx\n",
"rax:", vmsa->rax, "rbx:", vmsa->rbx);
pr_err("%-15s %016llx %-13s %016llx\n",
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 38/51] KVM: SVM: Pass through shadow stack MSRs as appropriate
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (36 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 37/51] KVM: SVM: Update dump_vmcb with shadow stack save area additions Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 39/51] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid Sean Christopherson
` (14 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: John Allen <john.allen@amd.com>
Pass through XSAVE managed CET MSRs on SVM when KVM supports shadow
stack. These cannot be intercepted without also intercepting XSAVE which
would likely cause unacceptable performance overhead.
MSR_IA32_INT_SSP_TAB is not managed by XSAVE, so it is intercepted.
Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/svm.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e50e6847fe72..cabe1950b160 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -844,6 +844,17 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
svm_disable_intercept_for_msr(vcpu, MSR_IA32_MPERF, MSR_TYPE_R);
}
+ if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+ bool shstk_enabled = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
+
+ svm_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, MSR_TYPE_RW, !shstk_enabled);
+ svm_set_intercept_for_msr(vcpu, MSR_IA32_S_CET, MSR_TYPE_RW, !shstk_enabled);
+ svm_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, MSR_TYPE_RW, !shstk_enabled);
+ svm_set_intercept_for_msr(vcpu, MSR_IA32_PL1_SSP, MSR_TYPE_RW, !shstk_enabled);
+ svm_set_intercept_for_msr(vcpu, MSR_IA32_PL2_SSP, MSR_TYPE_RW, !shstk_enabled);
+ svm_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, !shstk_enabled);
+ }
+
if (sev_es_guest(vcpu->kvm))
sev_es_recalc_msr_intercepts(vcpu);
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 39/51] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (37 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 38/51] KVM: SVM: Pass through shadow stack MSRs as appropriate Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 40/51] KVM: SVM: Enable shadow stack virtualization for SVM Sean Christopherson
` (13 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Synchronize XSS from the GHCB to KVM's internal tracking if the guest
marks XSS as valid on a #VMGEXIT. Like XCR0, KVM needs an up-to-date copy
of XSS in order to compute the required XSTATE size when emulating
CPUID.0xD.0x1 for the guest.
Treat the incoming XSS change as an emulated write, i.e. validatate the
guest-provided value, to avoid letting the guest load garbage into KVM's
tracking. Simply ignore bad values, as either the guest managed to get an
unsupported value into hardware, or the guest is misbehaving and providing
pure garbage. In either case, KVM can't fix the broken guest.
Explicitly allow access to XSS at all times, as KVM needs to ensure its
copy of XSS stays up-to-date. E.g. KVM supports migration of SEV-ES guests
and so needs to allow the host to save/restore XSS, otherwise a guest
that *knows* its XSS hasn't change could get stale/bad CPUID emulation if
the guest doesn't provide XSS in the GHCB on every exit. This creates a
hypothetical problem where a guest could request emulation of RDMSR or
WRMSR on XSS, but arguably that's not even a problem, e.g. it would be
entirely reasonable for a guest to request "emulation" as a way to inform
the hypervisor that its XSS value has been modified.
Note, emulating the change as an MSR write also takes care of side effects,
e.g. marking dynamic CPUID bits as dirty.
Suggested-by: John Allen <john.allen@amd.com>
base-commit: 14298d819d5a6b7180a4089e7d2121ca3551dc6c
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/sev.c | 3 +++
arch/x86/kvm/svm/svm.c | 4 ++--
arch/x86/kvm/svm/svm.h | 1 +
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 85e84bb1a368..94d9acc94c9a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3354,6 +3354,9 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
if (kvm_ghcb_xcr0_is_valid(svm))
__kvm_set_xcr(vcpu, 0, kvm_ghcb_get_xcr0(svm));
+ if (kvm_ghcb_xss_is_valid(svm))
+ __kvm_emulate_msr_write(vcpu, MSR_IA32_XSS, kvm_ghcb_get_xss(svm));
+
/* Copy the GHCB exit information into the VMCB fields */
exit_code = kvm_ghcb_get_sw_exit_code(svm);
control->exit_code = lower_32_bits(exit_code);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cabe1950b160..d48bf20c865b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2721,8 +2721,8 @@ static int svm_get_feature_msr(u32 msr, u64 *data)
static bool sev_es_prevent_msr_access(struct kvm_vcpu *vcpu,
struct msr_data *msr_info)
{
- return sev_es_guest(vcpu->kvm) &&
- vcpu->arch.guest_state_protected &&
+ return sev_es_guest(vcpu->kvm) && vcpu->arch.guest_state_protected &&
+ msr_info->index != MSR_IA32_XSS &&
!msr_write_intercepted(vcpu, msr_info->index);
}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e072f91045b5..a6a1daa3fc89 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -941,5 +941,6 @@ DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_1)
DEFINE_KVM_GHCB_ACCESSORS(sw_exit_info_2)
DEFINE_KVM_GHCB_ACCESSORS(sw_scratch)
DEFINE_KVM_GHCB_ACCESSORS(xcr0)
+DEFINE_KVM_GHCB_ACCESSORS(xss)
#endif
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 40/51] KVM: SVM: Enable shadow stack virtualization for SVM
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (38 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 39/51] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 41/51] KVM: x86: Add human friendly formatting for #XM, and #VE Sean Christopherson
` (12 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: John Allen <john.allen@amd.com>
Remove the explicit clearing of shadow stack CPU capabilities.
Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: John Allen <john.allen@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/svm.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d48bf20c865b..54ca0ec5ea57 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5262,10 +5262,7 @@ static __init void svm_set_cpu_caps(void)
kvm_set_cpu_caps();
kvm_caps.supported_perf_cap = 0;
- kvm_caps.supported_xss = 0;
- /* KVM doesn't yet support CET virtualization for SVM. */
- kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
kvm_cpu_cap_clear(X86_FEATURE_IBT);
/* CPUID 0x80000001 and 0x8000000A (SVM features) */
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 41/51] KVM: x86: Add human friendly formatting for #XM, and #VE
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (39 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 40/51] KVM: SVM: Enable shadow stack virtualization for SVM Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 8:29 ` Binbin Wu
2025-09-19 22:32 ` [PATCH v16 42/51] KVM: x86: Define Control Protection Exception (#CP) vector Sean Christopherson
` (11 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Add XM_VECTOR and VE_VECTOR pretty-printing for
trace_kvm_inj_exception().
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/trace.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 57d79fd31df0..06da19b370c5 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -461,8 +461,8 @@ TRACE_EVENT(kvm_inj_virq,
#define kvm_trace_sym_exc \
EXS(DE), EXS(DB), EXS(BP), EXS(OF), EXS(BR), EXS(UD), EXS(NM), \
- EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), \
- EXS(MF), EXS(AC), EXS(MC)
+ EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), EXS(MF), \
+ EXS(AC), EXS(MC), EXS(XM), EXS(VE)
/*
* Tracepoint for kvm interrupt injection:
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 42/51] KVM: x86: Define Control Protection Exception (#CP) vector
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (40 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 41/51] KVM: x86: Add human friendly formatting for #XM, and #VE Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 8:29 ` Binbin Wu
2025-09-19 22:32 ` [PATCH v16 43/51] KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors Sean Christopherson
` (10 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Add a CP_VECTOR definition for CET's Control Protection Exception (#CP),
along with human friendly formatting for trace_kvm_inj_exception().
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/trace.h | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 467116186e71..73e0e88a0a54 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -35,6 +35,7 @@
#define MC_VECTOR 18
#define XM_VECTOR 19
#define VE_VECTOR 20
+#define CP_VECTOR 21
/* Select x86 specific features in <linux/kvm.h> */
#define __KVM_HAVE_PIT
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 06da19b370c5..322913dda626 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -462,7 +462,7 @@ TRACE_EVENT(kvm_inj_virq,
#define kvm_trace_sym_exc \
EXS(DE), EXS(DB), EXS(BP), EXS(OF), EXS(BR), EXS(UD), EXS(NM), \
EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), EXS(MF), \
- EXS(AC), EXS(MC), EXS(XM), EXS(VE)
+ EXS(AC), EXS(MC), EXS(XM), EXS(VE), EXS(CP)
/*
* Tracepoint for kvm interrupt injection:
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 43/51] KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (41 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 42/51] KVM: x86: Define Control Protection Exception (#CP) vector Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 44/51] KVM: selftests: Add ex_str() to print human friendly name of " Sean Christopherson
` (9 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Add {HV,CP,SX}_VECTOR definitions for AMD's Hypervisor Injection Exception,
VMM Communication Exception, and SVM Security Exception vectors, along with
human friendly formatting for trace_kvm_inj_exception().
Note, KVM is all but guaranteed to never observe or inject #SX, and #HV is
also unlikely to go unused. Add the architectural collateral mostly for
completeness, and on the off chance that hardware goes off the rails.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/uapi/asm/kvm.h | 4 ++++
arch/x86/kvm/trace.h | 3 ++-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 73e0e88a0a54..d420c9c066d4 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -37,6 +37,10 @@
#define VE_VECTOR 20
#define CP_VECTOR 21
+#define HV_VECTOR 28
+#define VC_VECTOR 29
+#define SX_VECTOR 30
+
/* Select x86 specific features in <linux/kvm.h> */
#define __KVM_HAVE_PIT
#define __KVM_HAVE_IOAPIC
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 322913dda626..e79bc9cb7162 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -462,7 +462,8 @@ TRACE_EVENT(kvm_inj_virq,
#define kvm_trace_sym_exc \
EXS(DE), EXS(DB), EXS(BP), EXS(OF), EXS(BR), EXS(UD), EXS(NM), \
EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), EXS(MF), \
- EXS(AC), EXS(MC), EXS(XM), EXS(VE), EXS(CP)
+ EXS(AC), EXS(MC), EXS(XM), EXS(VE), EXS(CP), \
+ EXS(HV), EXS(VC), EXS(SX)
/*
* Tracepoint for kvm interrupt injection:
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 44/51] KVM: selftests: Add ex_str() to print human friendly name of exception vectors
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (42 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 43/51] KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 45/51] KVM: selftests: Add an MSR test to exercise guest/host and read/write Sean Christopherson
` (8 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Steal exception_mnemonic() from KVM-Unit-Tests as ex_str() (to keep line
lengths reasonable) and use it in assert messages that currently print the
raw vector number.
Co-developed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
.../selftests/kvm/include/x86/processor.h | 2 ++
.../testing/selftests/kvm/lib/x86/processor.c | 33 +++++++++++++++++++
.../selftests/kvm/x86/hyperv_features.c | 16 ++++-----
.../selftests/kvm/x86/monitor_mwait_test.c | 8 ++---
.../selftests/kvm/x86/pmu_counters_test.c | 4 +--
.../selftests/kvm/x86/vmx_pmu_caps_test.c | 4 +--
.../selftests/kvm/x86/xcr0_cpuid_test.c | 12 +++----
7 files changed, 57 insertions(+), 22 deletions(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index efcc4b1de523..2ad84f3809e8 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -34,6 +34,8 @@ extern uint64_t guest_tsc_khz;
#define NMI_VECTOR 0x02
+const char *ex_str(int vector);
+
#define X86_EFLAGS_FIXED (1u << 1)
#define X86_CR4_VME (1ul << 0)
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index 3b63c99f7b96..f9182dbd07f2 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -23,6 +23,39 @@ bool host_cpu_is_intel;
bool is_forced_emulation_enabled;
uint64_t guest_tsc_khz;
+const char *ex_str(int vector)
+{
+ switch (vector) {
+#define VEC_STR(v) case v##_VECTOR: return "#" #v
+ case DE_VECTOR: return "no exception";
+ case KVM_MAGIC_DE_VECTOR: return "#DE";
+ VEC_STR(DB);
+ VEC_STR(NMI);
+ VEC_STR(BP);
+ VEC_STR(OF);
+ VEC_STR(BR);
+ VEC_STR(UD);
+ VEC_STR(NM);
+ VEC_STR(DF);
+ VEC_STR(TS);
+ VEC_STR(NP);
+ VEC_STR(SS);
+ VEC_STR(GP);
+ VEC_STR(PF);
+ VEC_STR(MF);
+ VEC_STR(AC);
+ VEC_STR(MC);
+ VEC_STR(XM);
+ VEC_STR(VE);
+ VEC_STR(CP);
+ VEC_STR(HV);
+ VEC_STR(VC);
+ VEC_STR(SX);
+ default: return "#??";
+#undef VEC_STR
+ }
+}
+
static void regs_dump(FILE *stream, struct kvm_regs *regs, uint8_t indent)
{
fprintf(stream, "%*srax: 0x%.16llx rbx: 0x%.16llx "
diff --git a/tools/testing/selftests/kvm/x86/hyperv_features.c b/tools/testing/selftests/kvm/x86/hyperv_features.c
index 068e9c69710d..99d327084172 100644
--- a/tools/testing/selftests/kvm/x86/hyperv_features.c
+++ b/tools/testing/selftests/kvm/x86/hyperv_features.c
@@ -54,12 +54,12 @@ static void guest_msr(struct msr_data *msr)
if (msr->fault_expected)
__GUEST_ASSERT(vector == GP_VECTOR,
- "Expected #GP on %sMSR(0x%x), got vector '0x%x'",
- msr->write ? "WR" : "RD", msr->idx, vector);
+ "Expected #GP on %sMSR(0x%x), got %s",
+ msr->write ? "WR" : "RD", msr->idx, ex_str(vector));
else
__GUEST_ASSERT(!vector,
- "Expected success on %sMSR(0x%x), got vector '0x%x'",
- msr->write ? "WR" : "RD", msr->idx, vector);
+ "Expected success on %sMSR(0x%x), got %s",
+ msr->write ? "WR" : "RD", msr->idx, ex_str(vector));
if (vector || is_write_only_msr(msr->idx))
goto done;
@@ -102,12 +102,12 @@ static void guest_hcall(vm_vaddr_t pgs_gpa, struct hcall_data *hcall)
vector = __hyperv_hypercall(hcall->control, input, output, &res);
if (hcall->ud_expected) {
__GUEST_ASSERT(vector == UD_VECTOR,
- "Expected #UD for control '%lu', got vector '0x%x'",
- hcall->control, vector);
+ "Expected #UD for control '%lu', got %s",
+ hcall->control, ex_str(vector));
} else {
__GUEST_ASSERT(!vector,
- "Expected no exception for control '%lu', got vector '0x%x'",
- hcall->control, vector);
+ "Expected no exception for control '%lu', got %s",
+ hcall->control, ex_str(vector));
GUEST_ASSERT_EQ(res, hcall->expect);
}
diff --git a/tools/testing/selftests/kvm/x86/monitor_mwait_test.c b/tools/testing/selftests/kvm/x86/monitor_mwait_test.c
index 0eb371c62ab8..e45c028d2a7e 100644
--- a/tools/testing/selftests/kvm/x86/monitor_mwait_test.c
+++ b/tools/testing/selftests/kvm/x86/monitor_mwait_test.c
@@ -30,12 +30,12 @@ do { \
\
if (fault_wanted) \
__GUEST_ASSERT((vector) == UD_VECTOR, \
- "Expected #UD on " insn " for testcase '0x%x', got '0x%x'", \
- testcase, vector); \
+ "Expected #UD on " insn " for testcase '0x%x', got %s", \
+ testcase, ex_str(vector)); \
else \
__GUEST_ASSERT(!(vector), \
- "Expected success on " insn " for testcase '0x%x', got '0x%x'", \
- testcase, vector); \
+ "Expected success on " insn " for testcase '0x%x', got %s", \
+ testcase, ex_str(vector)); \
} while (0)
static void guest_monitor_wait(void *arg)
diff --git a/tools/testing/selftests/kvm/x86/pmu_counters_test.c b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
index 89c1e462cd1c..24288b460636 100644
--- a/tools/testing/selftests/kvm/x86/pmu_counters_test.c
+++ b/tools/testing/selftests/kvm/x86/pmu_counters_test.c
@@ -346,8 +346,8 @@ static void test_arch_events(uint8_t pmu_version, uint64_t perf_capabilities,
#define GUEST_ASSERT_PMC_MSR_ACCESS(insn, msr, expect_gp, vector) \
__GUEST_ASSERT(expect_gp ? vector == GP_VECTOR : !vector, \
- "Expected %s on " #insn "(0x%x), got vector %u", \
- expect_gp ? "#GP" : "no fault", msr, vector) \
+ "Expected %s on " #insn "(0x%x), got %s", \
+ expect_gp ? "#GP" : "no fault", msr, ex_str(vector)) \
#define GUEST_ASSERT_PMC_VALUE(insn, msr, val, expected) \
__GUEST_ASSERT(val == expected, \
diff --git a/tools/testing/selftests/kvm/x86/vmx_pmu_caps_test.c b/tools/testing/selftests/kvm/x86/vmx_pmu_caps_test.c
index a1f5ff45d518..7d37f0cd4eb9 100644
--- a/tools/testing/selftests/kvm/x86/vmx_pmu_caps_test.c
+++ b/tools/testing/selftests/kvm/x86/vmx_pmu_caps_test.c
@@ -56,8 +56,8 @@ static void guest_test_perf_capabilities_gp(uint64_t val)
uint8_t vector = wrmsr_safe(MSR_IA32_PERF_CAPABILITIES, val);
__GUEST_ASSERT(vector == GP_VECTOR,
- "Expected #GP for value '0x%lx', got vector '0x%x'",
- val, vector);
+ "Expected #GP for value '0x%lx', got %s",
+ val, ex_str(vector));
}
static void guest_code(uint64_t current_val)
diff --git a/tools/testing/selftests/kvm/x86/xcr0_cpuid_test.c b/tools/testing/selftests/kvm/x86/xcr0_cpuid_test.c
index c8a5c5e51661..d038c1571729 100644
--- a/tools/testing/selftests/kvm/x86/xcr0_cpuid_test.c
+++ b/tools/testing/selftests/kvm/x86/xcr0_cpuid_test.c
@@ -81,13 +81,13 @@ static void guest_code(void)
vector = xsetbv_safe(0, XFEATURE_MASK_FP);
__GUEST_ASSERT(!vector,
- "Expected success on XSETBV(FP), got vector '0x%x'",
- vector);
+ "Expected success on XSETBV(FP), got %s",
+ ex_str(vector));
vector = xsetbv_safe(0, supported_xcr0);
__GUEST_ASSERT(!vector,
- "Expected success on XSETBV(0x%lx), got vector '0x%x'",
- supported_xcr0, vector);
+ "Expected success on XSETBV(0x%lx), got %s",
+ supported_xcr0, ex_str(vector));
for (i = 0; i < 64; i++) {
if (supported_xcr0 & BIT_ULL(i))
@@ -95,8 +95,8 @@ static void guest_code(void)
vector = xsetbv_safe(0, supported_xcr0 | BIT_ULL(i));
__GUEST_ASSERT(vector == GP_VECTOR,
- "Expected #GP on XSETBV(0x%llx), supported XCR0 = %lx, got vector '0x%x'",
- BIT_ULL(i), supported_xcr0, vector);
+ "Expected #GP on XSETBV(0x%llx), supported XCR0 = %lx, got %s",
+ BIT_ULL(i), supported_xcr0, ex_str(vector));
}
GUEST_DONE();
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 45/51] KVM: selftests: Add an MSR test to exercise guest/host and read/write
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (43 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 44/51] KVM: selftests: Add ex_str() to print human friendly name of " Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-23 8:03 ` Chao Gao
2025-09-19 22:32 ` [PATCH v16 46/51] KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test Sean Christopherson
` (7 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Add a selftest to verify reads and writes to various MSRs, from both the
guest and host, and expect success/failure based on whether or not the
vCPU supports the MSR according to supported CPUID.
Note, this test is extremely similar to KVM-Unit-Test's "msr" test, but
provides more coverage with respect to host accesses, and will be extended
to provide addition testing of CPUID-based features, save/restore lists,
and KVM_{G,S}ET_ONE_REG, all which are extremely difficult to validate in
KUT.
If kvm.ignore_msrs=true, skip the unsupported and reserved testcases as
KVM's ABI is a mess; what exactly is supposed to be ignored, and when,
varies wildly.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../selftests/kvm/include/x86/processor.h | 5 +
tools/testing/selftests/kvm/x86/msrs_test.c | 315 ++++++++++++++++++
3 files changed, 321 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86/msrs_test.c
diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 66c82f51837b..1d1b77dabb36 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -87,6 +87,7 @@ TEST_GEN_PROGS_x86 += x86/kvm_clock_test
TEST_GEN_PROGS_x86 += x86/kvm_pv_test
TEST_GEN_PROGS_x86 += x86/kvm_buslock_test
TEST_GEN_PROGS_x86 += x86/monitor_mwait_test
+TEST_GEN_PROGS_x86 += x86/msrs_test
TEST_GEN_PROGS_x86 += x86/nested_emulation_test
TEST_GEN_PROGS_x86 += x86/nested_exceptions_test
TEST_GEN_PROGS_x86 += x86/platform_info_test
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 2ad84f3809e8..fb3e6ab81a80 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1357,6 +1357,11 @@ static inline bool kvm_is_unrestricted_guest_enabled(void)
return get_kvm_intel_param_bool("unrestricted_guest");
}
+static inline bool kvm_is_ignore_msrs(void)
+{
+ return get_kvm_param_bool("ignore_msrs");
+}
+
uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
int *level);
uint64_t *vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr);
diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
new file mode 100644
index 000000000000..9285cf51ef75
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
@@ -0,0 +1,315 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <asm/msr-index.h>
+
+#include <stdint.h>
+
+#include "kvm_util.h"
+#include "processor.h"
+
+/* Use HYPERVISOR for MSRs that are emulated unconditionally (as is HYPERVISOR). */
+#define X86_FEATURE_NONE X86_FEATURE_HYPERVISOR
+
+struct kvm_msr {
+ const struct kvm_x86_cpu_feature feature;
+ const struct kvm_x86_cpu_feature feature2;
+ const char *name;
+ const u64 reset_val;
+ const u64 write_val;
+ const u64 rsvd_val;
+ const u32 index;
+};
+
+#define ____MSR_TEST(msr, str, val, rsvd, reset, feat, f2) \
+{ \
+ .index = msr, \
+ .name = str, \
+ .write_val = val, \
+ .rsvd_val = rsvd, \
+ .reset_val = reset, \
+ .feature = X86_FEATURE_ ##feat, \
+ .feature2 = X86_FEATURE_ ##f2, \
+}
+
+#define __MSR_TEST(msr, str, val, rsvd, reset, feat) \
+ ____MSR_TEST(msr, str, val, rsvd, reset, feat, feat)
+
+#define MSR_TEST_NON_ZERO(msr, val, rsvd, reset, feat) \
+ __MSR_TEST(msr, #msr, val, rsvd, reset, feat)
+
+#define MSR_TEST(msr, val, rsvd, feat) \
+ __MSR_TEST(msr, #msr, val, rsvd, 0, feat)
+
+#define MSR_TEST2(msr, val, rsvd, feat, f2) \
+ ____MSR_TEST(msr, #msr, val, rsvd, 0, feat, f2)
+
+/*
+ * Note, use a page aligned value for the canonical value so that the value
+ * is compatible with MSRs that use bits 11:0 for things other than addresses.
+ */
+static const u64 canonical_val = 0x123456789000ull;
+
+#define MSR_TEST_CANONICAL(msr, feat) \
+ __MSR_TEST(msr, #msr, canonical_val, NONCANONICAL, 0, feat)
+
+/*
+ * The main struct must be scoped to a function due to the use of structures to
+ * define features. For the global structure, allocate enough space for the
+ * foreseeable future without getting too ridiculous, to minimize maintenance
+ * costs (bumping the array size every time an MSR is added is really annoying).
+ */
+static struct kvm_msr msrs[128];
+static int idx;
+
+static bool ignore_unsupported_msrs;
+
+static u64 fixup_rdmsr_val(u32 msr, u64 want)
+{
+ /*
+ * AMD CPUs drop bits 63:32 on some MSRs that Intel CPUs support. KVM
+ * is supposed to emulate that behavior based on guest vendor model
+ * (which is the same as the host vendor model for this test).
+ */
+ if (!host_cpu_is_amd)
+ return want;
+
+ switch (msr) {
+ case MSR_IA32_SYSENTER_ESP:
+ case MSR_IA32_SYSENTER_EIP:
+ case MSR_TSC_AUX:
+ return want & GENMASK_ULL(31, 0);
+ default:
+ return want;
+ }
+}
+
+static void __rdmsr(u32 msr, u64 want)
+{
+ u64 val;
+ u8 vec;
+
+ vec = rdmsr_safe(msr, &val);
+ __GUEST_ASSERT(!vec, "Unexpected %s on RDMSR(0x%x)", ex_str(vec), msr);
+
+ __GUEST_ASSERT(val == want, "Wanted 0x%lx from RDMSR(0x%x), got 0x%lx",
+ want, msr, val);
+}
+
+static void __wrmsr(u32 msr, u64 val)
+{
+ u8 vec;
+
+ vec = wrmsr_safe(msr, val);
+ __GUEST_ASSERT(!vec, "Unexpected %s on WRMSR(0x%x, 0x%lx)",
+ ex_str(vec), msr, val);
+ __rdmsr(msr, fixup_rdmsr_val(msr, val));
+}
+
+static void guest_test_supported_msr(const struct kvm_msr *msr)
+{
+ __rdmsr(msr->index, msr->reset_val);
+ __wrmsr(msr->index, msr->write_val);
+ GUEST_SYNC(fixup_rdmsr_val(msr->index, msr->write_val));
+
+ __rdmsr(msr->index, msr->reset_val);
+}
+
+static void guest_test_unsupported_msr(const struct kvm_msr *msr)
+{
+ u64 val;
+ u8 vec;
+
+ /*
+ * KVM's ABI with respect to ignore_msrs is a mess and largely beyond
+ * repair, just skip the unsupported MSR tests.
+ */
+ if (ignore_unsupported_msrs)
+ goto skip_wrmsr_gp;
+
+ if (this_cpu_has(msr->feature2))
+ goto skip_wrmsr_gp;
+
+ vec = rdmsr_safe(msr->index, &val);
+ __GUEST_ASSERT(vec == GP_VECTOR, "Wanted #GP on RDMSR(0x%x), got %s",
+ msr->index, ex_str(vec));
+
+ vec = wrmsr_safe(msr->index, msr->write_val);
+ __GUEST_ASSERT(vec == GP_VECTOR, "Wanted #GP on WRMSR(0x%x, 0x%lx), got %s",
+ msr->index, msr->write_val, ex_str(vec));
+
+skip_wrmsr_gp:
+ GUEST_SYNC(0);
+}
+
+void guest_test_reserved_val(const struct kvm_msr *msr)
+{
+ /* Skip reserved value checks as well, ignore_msrs is trully a mess. */
+ if (ignore_unsupported_msrs)
+ return;
+
+ /*
+ * If the CPU will truncate the written value (e.g. SYSENTER on AMD),
+ * expect success and a truncated value, not #GP.
+ */
+ if (!this_cpu_has(msr->feature) ||
+ msr->rsvd_val == fixup_rdmsr_val(msr->index, msr->rsvd_val)) {
+ u8 vec = wrmsr_safe(msr->index, msr->rsvd_val);
+
+ __GUEST_ASSERT(vec == GP_VECTOR,
+ "Wanted #GP on WRMSR(0x%x, 0x%lx), got %s",
+ msr->index, msr->rsvd_val, ex_str(vec));
+ } else {
+ __wrmsr(msr->index, msr->rsvd_val);
+ __wrmsr(msr->index, msr->reset_val);
+ }
+}
+
+static void guest_main(void)
+{
+ for (;;) {
+ const struct kvm_msr *msr = &msrs[READ_ONCE(idx)];
+
+ if (this_cpu_has(msr->feature))
+ guest_test_supported_msr(msr);
+ else
+ guest_test_unsupported_msr(msr);
+
+ if (msr->rsvd_val)
+ guest_test_reserved_val(msr);
+
+ GUEST_SYNC(msr->reset_val);
+ }
+}
+
+static void host_test_msr(struct kvm_vcpu *vcpu, u64 guest_val)
+{
+ u64 reset_val = msrs[idx].reset_val;
+ u32 msr = msrs[idx].index;
+ u64 val;
+
+ if (!kvm_cpu_has(msrs[idx].feature))
+ return;
+
+ val = vcpu_get_msr(vcpu, msr);
+ TEST_ASSERT(val == guest_val, "Wanted 0x%lx from get_msr(0x%x), got 0x%lx",
+ guest_val, msr, val);
+
+ vcpu_set_msr(vcpu, msr, reset_val);
+
+ val = vcpu_get_msr(vcpu, msr);
+ TEST_ASSERT(val == reset_val, "Wanted 0x%lx from get_msr(0x%x), got 0x%lx",
+ reset_val, msr, val);
+}
+
+static void do_vcpu_run(struct kvm_vcpu *vcpu)
+{
+ struct ucall uc;
+
+ for (;;) {
+ vcpu_run(vcpu);
+
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_SYNC:
+ host_test_msr(vcpu, uc.args[1]);
+ return;
+ case UCALL_PRINTF:
+ pr_info("%s", uc.buffer);
+ break;
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ case UCALL_DONE:
+ TEST_FAIL("Unexpected UCALL_DONE");
+ default:
+ TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
+ }
+ }
+}
+
+static void vcpus_run(struct kvm_vcpu **vcpus, const int NR_VCPUS)
+{
+ int i;
+
+ for (i = 0; i < NR_VCPUS; i++)
+ do_vcpu_run(vcpus[i]);
+}
+
+#define MISC_ENABLES_RESET_VAL (MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL | MSR_IA32_MISC_ENABLE_BTS_UNAVAIL)
+
+static void test_msrs(void)
+{
+ const struct kvm_msr __msrs[] = {
+ MSR_TEST_NON_ZERO(MSR_IA32_MISC_ENABLE,
+ MISC_ENABLES_RESET_VAL | MSR_IA32_MISC_ENABLE_FAST_STRING,
+ MSR_IA32_MISC_ENABLE_FAST_STRING, MISC_ENABLES_RESET_VAL, NONE),
+ MSR_TEST_NON_ZERO(MSR_IA32_CR_PAT, 0x07070707, 0, 0x7040600070406, NONE),
+
+ /*
+ * TSC_AUX is supported if RDTSCP *or* RDPID is supported. Add
+ * entries for each features so that TSC_AUX doesn't exists for
+ * the "unsupported" vCPU, and obviously to test both cases.
+ */
+ MSR_TEST2(MSR_TSC_AUX, 0x12345678, canonical_val, RDTSCP, RDPID),
+ MSR_TEST2(MSR_TSC_AUX, 0x12345678, canonical_val, RDPID, RDTSCP),
+
+ MSR_TEST(MSR_IA32_SYSENTER_CS, 0x1234, 0, NONE),
+ /*
+ * SYSENTER_{ESP,EIP} are technically non-canonical on Intel,
+ * but KVM doesn't emulate that behavior on emulated writes,
+ * i.e. this test will observe different behavior if the MSR
+ * writes are handed by hardware vs. KVM. KVM's behavior is
+ * intended (though far from ideal), so don't bother testing
+ * non-canonical values.
+ */
+ MSR_TEST(MSR_IA32_SYSENTER_ESP, canonical_val, 0, NONE),
+ MSR_TEST(MSR_IA32_SYSENTER_EIP, canonical_val, 0, NONE),
+
+ MSR_TEST_CANONICAL(MSR_FS_BASE, LM),
+ MSR_TEST_CANONICAL(MSR_GS_BASE, LM),
+ MSR_TEST_CANONICAL(MSR_KERNEL_GS_BASE, LM),
+ MSR_TEST_CANONICAL(MSR_LSTAR, LM),
+ MSR_TEST_CANONICAL(MSR_CSTAR, LM),
+ MSR_TEST(MSR_SYSCALL_MASK, 0xffffffff, 0, LM),
+
+ MSR_TEST_CANONICAL(MSR_IA32_PL0_SSP, SHSTK),
+ MSR_TEST(MSR_IA32_PL0_SSP, canonical_val, canonical_val | 1, SHSTK),
+ MSR_TEST_CANONICAL(MSR_IA32_PL1_SSP, SHSTK),
+ MSR_TEST(MSR_IA32_PL1_SSP, canonical_val, canonical_val | 1, SHSTK),
+ MSR_TEST_CANONICAL(MSR_IA32_PL2_SSP, SHSTK),
+ MSR_TEST(MSR_IA32_PL2_SSP, canonical_val, canonical_val | 1, SHSTK),
+ MSR_TEST_CANONICAL(MSR_IA32_PL3_SSP, SHSTK),
+ MSR_TEST(MSR_IA32_PL3_SSP, canonical_val, canonical_val | 1, SHSTK),
+ };
+
+ /*
+ * Create two vCPUs, but run them on the same task, to validate KVM's
+ * context switching of MSR state. Don't pin the task to a pCPU to
+ * also validate KVM's handling of cross-pCPU migration.
+ */
+ const int NR_VCPUS = 2;
+ struct kvm_vcpu *vcpus[NR_VCPUS];
+ struct kvm_vm *vm;
+
+ kvm_static_assert(sizeof(__msrs) <= sizeof(msrs));
+ kvm_static_assert(ARRAY_SIZE(__msrs) <= ARRAY_SIZE(msrs));
+ memcpy(msrs, __msrs, sizeof(__msrs));
+
+ ignore_unsupported_msrs = kvm_is_ignore_msrs();
+
+ vm = vm_create_with_vcpus(NR_VCPUS, guest_main, vcpus);
+
+ sync_global_to_guest(vm, msrs);
+ sync_global_to_guest(vm, ignore_unsupported_msrs);
+
+ for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) {
+ sync_global_to_guest(vm, idx);
+
+ vcpus_run(vcpus, NR_VCPUS);
+ vcpus_run(vcpus, NR_VCPUS);
+ }
+
+ kvm_vm_free(vm);
+}
+
+int main(void)
+{
+ test_msrs();
+}
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 46/51] KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (44 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 45/51] KVM: selftests: Add an MSR test to exercise guest/host and read/write Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-23 7:12 ` Chao Gao
2025-09-19 22:32 ` [PATCH v16 47/51] KVM: selftests: Extend MSRs test to validate vCPUs without supported features Sean Christopherson
` (6 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Extend the MSRs test to support {S,U}_CET, which are a bit of a pain to
handled due to the MSRs existing if IBT *or* SHSTK is supported. To deal
with Intel's wonderful decision to bundle IBT and SHSTK under CET, track
the second feature, but skip only RDMSR #GP tests to avoid false failures
when running on a CPU with only one of IBT or SHSTK (the WRMSR #GP tests
are still valid since the enable bits are per-feature).
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/x86/msrs_test.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
index 9285cf51ef75..952439e0c754 100644
--- a/tools/testing/selftests/kvm/x86/msrs_test.c
+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
@@ -125,13 +125,26 @@ static void guest_test_unsupported_msr(const struct kvm_msr *msr)
if (ignore_unsupported_msrs)
goto skip_wrmsr_gp;
- if (this_cpu_has(msr->feature2))
- goto skip_wrmsr_gp;
+ /*
+ * {S,U}_CET exist if IBT or SHSTK is supported, but with bits that are
+ * writable only if their associated feature is supported. Skip the
+ * RDMSR #GP test if the secondary feature is supported, but perform
+ * the WRMSR #GP test as the to-be-written value is tied to the primary
+ * feature. For all other MSRs, simply do nothing.
+ */
+ if (this_cpu_has(msr->feature2)) {
+ if (msr->index != MSR_IA32_U_CET &&
+ msr->index != MSR_IA32_S_CET)
+ goto skip_wrmsr_gp;
+
+ goto skip_rdmsr_gp;
+ }
vec = rdmsr_safe(msr->index, &val);
__GUEST_ASSERT(vec == GP_VECTOR, "Wanted #GP on RDMSR(0x%x), got %s",
msr->index, ex_str(vec));
+skip_rdmsr_gp:
vec = wrmsr_safe(msr->index, msr->write_val);
__GUEST_ASSERT(vec == GP_VECTOR, "Wanted #GP on WRMSR(0x%x, 0x%lx), got %s",
msr->index, msr->write_val, ex_str(vec));
@@ -269,6 +282,10 @@ static void test_msrs(void)
MSR_TEST_CANONICAL(MSR_CSTAR, LM),
MSR_TEST(MSR_SYSCALL_MASK, 0xffffffff, 0, LM),
+ MSR_TEST2(MSR_IA32_S_CET, CET_SHSTK_EN, CET_RESERVED, SHSTK, IBT),
+ MSR_TEST2(MSR_IA32_S_CET, CET_ENDBR_EN, CET_RESERVED, IBT, SHSTK),
+ MSR_TEST2(MSR_IA32_U_CET, CET_SHSTK_EN, CET_RESERVED, SHSTK, IBT),
+ MSR_TEST2(MSR_IA32_U_CET, CET_ENDBR_EN, CET_RESERVED, IBT, SHSTK),
MSR_TEST_CANONICAL(MSR_IA32_PL0_SSP, SHSTK),
MSR_TEST(MSR_IA32_PL0_SSP, canonical_val, canonical_val | 1, SHSTK),
MSR_TEST_CANONICAL(MSR_IA32_PL1_SSP, SHSTK),
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 47/51] KVM: selftests: Extend MSRs test to validate vCPUs without supported features
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (45 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 46/51] KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 48/51] KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test Sean Christopherson
` (5 subsequent siblings)
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Add a third vCPUs to the MSRs test that runs with all features disabled in
the vCPU's CPUID model, to verify that KVM does the right thing with
respect to emulating accesses to MSRs that shouldn't exist. Use the same
VM to verify that KVM is honoring the vCPU model, e.g. isn't looking at
per-VM state when emulating MSR accesses.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/x86/msrs_test.c | 28 ++++++++++++++++++---
1 file changed, 25 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
index 952439e0c754..f69091ebd270 100644
--- a/tools/testing/selftests/kvm/x86/msrs_test.c
+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
@@ -296,12 +296,17 @@ static void test_msrs(void)
MSR_TEST(MSR_IA32_PL3_SSP, canonical_val, canonical_val | 1, SHSTK),
};
+ const struct kvm_x86_cpu_feature feat_none = X86_FEATURE_NONE;
+ const struct kvm_x86_cpu_feature feat_lm = X86_FEATURE_LM;
+
/*
- * Create two vCPUs, but run them on the same task, to validate KVM's
+ * Create three vCPUs, but run them on the same task, to validate KVM's
* context switching of MSR state. Don't pin the task to a pCPU to
- * also validate KVM's handling of cross-pCPU migration.
+ * also validate KVM's handling of cross-pCPU migration. Use the full
+ * set of features for the first two vCPUs, but clear all features in
+ * third vCPU in order to test both positive and negative paths.
*/
- const int NR_VCPUS = 2;
+ const int NR_VCPUS = 3;
struct kvm_vcpu *vcpus[NR_VCPUS];
struct kvm_vm *vm;
@@ -316,6 +321,23 @@ static void test_msrs(void)
sync_global_to_guest(vm, msrs);
sync_global_to_guest(vm, ignore_unsupported_msrs);
+ /*
+ * Clear features in the "unsupported features" vCPU. This needs to be
+ * done before the first vCPU run as KVM's ABI is that guest CPUID is
+ * immutable once the vCPU has been run.
+ */
+ for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) {
+ /*
+ * Don't clear LM; selftests are 64-bit only, and KVM doesn't
+ * honor LM=0 for MSRs that are supposed to exist if and only
+ * if the vCPU is a 64-bit model. Ditto for NONE; clearing a
+ * fake feature flag will result in false failures.
+ */
+ if (memcmp(&msrs[idx].feature, &feat_lm, sizeof(feat_lm)) &&
+ memcmp(&msrs[idx].feature, &feat_none, sizeof(feat_none)))
+ vcpu_clear_cpuid_feature(vcpus[2], msrs[idx].feature);
+ }
+
for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) {
sync_global_to_guest(vm, idx);
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 48/51] KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (46 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 47/51] KVM: selftests: Extend MSRs test to validate vCPUs without supported features Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-23 6:52 ` Chao Gao
2025-09-19 22:32 ` [PATCH v16 49/51] KVM: selftests: Add coverate for KVM-defined registers in " Sean Christopherson
` (4 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
When KVM_{G,S}ET_ONE_REG are supported, verify that MSRs can be accessed
via ONE_REG and through the dedicated MSR ioctls. For simplicity, run
the test twice, e.g. instead of trying to get MSR values into the exact
right state when switching write methods.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/x86/msrs_test.c | 22 ++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
index f69091ebd270..2dc4017072c6 100644
--- a/tools/testing/selftests/kvm/x86/msrs_test.c
+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
@@ -193,6 +193,9 @@ static void guest_main(void)
}
}
+static bool has_one_reg;
+static bool use_one_reg;
+
static void host_test_msr(struct kvm_vcpu *vcpu, u64 guest_val)
{
u64 reset_val = msrs[idx].reset_val;
@@ -206,11 +209,21 @@ static void host_test_msr(struct kvm_vcpu *vcpu, u64 guest_val)
TEST_ASSERT(val == guest_val, "Wanted 0x%lx from get_msr(0x%x), got 0x%lx",
guest_val, msr, val);
- vcpu_set_msr(vcpu, msr, reset_val);
+ if (use_one_reg)
+ vcpu_set_reg(vcpu, KVM_X86_REG_MSR(msr), reset_val);
+ else
+ vcpu_set_msr(vcpu, msr, reset_val);
val = vcpu_get_msr(vcpu, msr);
TEST_ASSERT(val == reset_val, "Wanted 0x%lx from get_msr(0x%x), got 0x%lx",
reset_val, msr, val);
+
+ if (!has_one_reg)
+ return;
+
+ val = vcpu_get_reg(vcpu, KVM_X86_REG_MSR(msr));
+ TEST_ASSERT(val == reset_val, "Wanted 0x%lx from get_reg(0x%x), got 0x%lx",
+ reset_val, msr, val);
}
static void do_vcpu_run(struct kvm_vcpu *vcpu)
@@ -350,5 +363,12 @@ static void test_msrs(void)
int main(void)
{
+ has_one_reg = kvm_has_cap(KVM_CAP_ONE_REG);
+
test_msrs();
+
+ if (has_one_reg) {
+ use_one_reg = true;
+ test_msrs();
+ }
}
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 49/51] KVM: selftests: Add coverate for KVM-defined registers in MSRs test
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (47 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 48/51] KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-23 6:31 ` Chao Gao
2025-09-19 22:32 ` [PATCH v16 50/51] KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported Sean Christopherson
` (3 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Add test coverage for the KVM-defined GUEST_SSP "register" in the MSRs
test. While _KVM's_ goal is to not tie the uAPI of KVM-defined registers
to any particular internal implementation, i.e. to not commit in uAPI to
handling GUEST_SSP as an MSR, treating GUEST_SSP as an MSR for testing
purposes is a-ok and is a naturally fit given the semantics of SSP.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/x86/msrs_test.c | 97 ++++++++++++++++++++-
1 file changed, 94 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
index 2dc4017072c6..7c6d846e42dd 100644
--- a/tools/testing/selftests/kvm/x86/msrs_test.c
+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
@@ -17,9 +17,10 @@ struct kvm_msr {
const u64 write_val;
const u64 rsvd_val;
const u32 index;
+ const bool is_kvm_defined;
};
-#define ____MSR_TEST(msr, str, val, rsvd, reset, feat, f2) \
+#define ____MSR_TEST(msr, str, val, rsvd, reset, feat, f2, is_kvm) \
{ \
.index = msr, \
.name = str, \
@@ -28,10 +29,11 @@ struct kvm_msr {
.reset_val = reset, \
.feature = X86_FEATURE_ ##feat, \
.feature2 = X86_FEATURE_ ##f2, \
+ .is_kvm_defined = is_kvm, \
}
#define __MSR_TEST(msr, str, val, rsvd, reset, feat) \
- ____MSR_TEST(msr, str, val, rsvd, reset, feat, feat)
+ ____MSR_TEST(msr, str, val, rsvd, reset, feat, feat, false)
#define MSR_TEST_NON_ZERO(msr, val, rsvd, reset, feat) \
__MSR_TEST(msr, #msr, val, rsvd, reset, feat)
@@ -40,7 +42,7 @@ struct kvm_msr {
__MSR_TEST(msr, #msr, val, rsvd, 0, feat)
#define MSR_TEST2(msr, val, rsvd, feat, f2) \
- ____MSR_TEST(msr, #msr, val, rsvd, 0, feat, f2)
+ ____MSR_TEST(msr, #msr, val, rsvd, 0, feat, f2, false)
/*
* Note, use a page aligned value for the canonical value so that the value
@@ -51,6 +53,9 @@ static const u64 canonical_val = 0x123456789000ull;
#define MSR_TEST_CANONICAL(msr, feat) \
__MSR_TEST(msr, #msr, canonical_val, NONCANONICAL, 0, feat)
+#define MSR_TEST_KVM(msr, val, rsvd, feat) \
+ ____MSR_TEST(KVM_REG_ ##msr, #msr, val, rsvd, 0, feat, feat, true)
+
/*
* The main struct must be scoped to a function due to the use of structures to
* define features. For the global structure, allocate enough space for the
@@ -196,6 +201,83 @@ static void guest_main(void)
static bool has_one_reg;
static bool use_one_reg;
+#define KVM_X86_MAX_NR_REGS 1
+
+static bool vcpu_has_reg(struct kvm_vcpu *vcpu, u64 reg)
+{
+ struct {
+ struct kvm_reg_list list;
+ u64 regs[KVM_X86_MAX_NR_REGS];
+ } regs = {};
+ int r, i;
+
+ /*
+ * If KVM_GET_REG_LIST succeeds with n=0, i.e. there are no supported
+ * regs, then the vCPU obviously doesn't support the reg.
+ */
+ r = __vcpu_ioctl(vcpu, KVM_GET_REG_LIST, ®s.list.n);
+ if (!r)
+ return false;
+
+ TEST_ASSERT_EQ(errno, E2BIG);
+
+ /*
+ * KVM x86 is expected to support enumerating a relative small number
+ * of regs. The majority of registers supported by KVM_{G,S}ET_ONE_REG
+ * are enumerated via other ioctls, e.g. KVM_GET_MSR_INDEX_LIST. For
+ * simplicity, hardcode the maximum number of regs and manually update
+ * the test as necessary.
+ */
+ TEST_ASSERT(regs.list.n <= KVM_X86_MAX_NR_REGS,
+ "KVM reports %llu regs, test expects at most %u regs, stale test?",
+ regs.list.n, KVM_X86_MAX_NR_REGS);
+
+ vcpu_ioctl(vcpu, KVM_GET_REG_LIST, ®s.list.n);
+ for (i = 0; i < regs.list.n; i++) {
+ if (regs.regs[i] == reg)
+ return true;
+ }
+
+ return false;
+}
+
+static void host_test_kvm_reg(struct kvm_vcpu *vcpu)
+{
+ bool has_reg = vcpu_cpuid_has(vcpu, msrs[idx].feature);
+ u64 reset_val = msrs[idx].reset_val;
+ u64 write_val = msrs[idx].write_val;
+ u64 rsvd_val = msrs[idx].rsvd_val;
+ u32 reg = msrs[idx].index;
+ u64 val;
+ int r;
+
+ if (!use_one_reg)
+ return;
+
+ TEST_ASSERT_EQ(vcpu_has_reg(vcpu, KVM_X86_REG_KVM(reg)), has_reg);
+
+ if (!has_reg) {
+ r = __vcpu_get_reg(vcpu, KVM_X86_REG_KVM(reg), &val);
+ TEST_ASSERT(r && errno == EINVAL,
+ "Expected failure on get_reg(0x%x)", reg);
+ rsvd_val = 0;
+ goto out;
+ }
+
+ val = vcpu_get_reg(vcpu, KVM_X86_REG_KVM(reg));
+ TEST_ASSERT(val == reset_val, "Wanted 0x%lx from get_reg(0x%x), got 0x%lx",
+ reset_val, reg, val);
+
+ vcpu_set_reg(vcpu, KVM_X86_REG_KVM(reg), write_val);
+ val = vcpu_get_reg(vcpu, KVM_X86_REG_KVM(reg));
+ TEST_ASSERT(val == write_val, "Wanted 0x%lx from get_reg(0x%x), got 0x%lx",
+ write_val, reg, val);
+
+out:
+ r = __vcpu_set_reg(vcpu, KVM_X86_REG_KVM(reg), rsvd_val);
+ TEST_ASSERT(r, "Expected failure on set_reg(0x%x, 0x%lx)", reg, rsvd_val);
+}
+
static void host_test_msr(struct kvm_vcpu *vcpu, u64 guest_val)
{
u64 reset_val = msrs[idx].reset_val;
@@ -307,6 +389,8 @@ static void test_msrs(void)
MSR_TEST(MSR_IA32_PL2_SSP, canonical_val, canonical_val | 1, SHSTK),
MSR_TEST_CANONICAL(MSR_IA32_PL3_SSP, SHSTK),
MSR_TEST(MSR_IA32_PL3_SSP, canonical_val, canonical_val | 1, SHSTK),
+
+ MSR_TEST_KVM(GUEST_SSP, canonical_val, NONCANONICAL, SHSTK),
};
const struct kvm_x86_cpu_feature feat_none = X86_FEATURE_NONE;
@@ -322,6 +406,7 @@ static void test_msrs(void)
const int NR_VCPUS = 3;
struct kvm_vcpu *vcpus[NR_VCPUS];
struct kvm_vm *vm;
+ int i;
kvm_static_assert(sizeof(__msrs) <= sizeof(msrs));
kvm_static_assert(ARRAY_SIZE(__msrs) <= ARRAY_SIZE(msrs));
@@ -352,6 +437,12 @@ static void test_msrs(void)
}
for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) {
+ if (msrs[idx].is_kvm_defined) {
+ for (i = 0; i < NR_VCPUS; i++)
+ host_test_kvm_reg(vcpus[i]);
+ continue;
+ }
+
sync_global_to_guest(vm, idx);
vcpus_run(vcpus, NR_VCPUS);
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 50/51] KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (48 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 49/51] KVM: selftests: Add coverate for KVM-defined registers in " Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-23 6:46 ` Chao Gao
2025-09-19 22:32 ` [PATCH v16 51/51] KVM: VMX: Make CR4.CET a guest owned bit Sean Christopherson
` (2 subsequent siblings)
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
Add a check in the MSRs test to verify that KVM's reported support for
MSRs with feature bits is consistent between KVM's MSR save/restore lists
and KVM's supported CPUID.
To deal with Intel's wonderful decision to bundle IBT and SHSTK under CET,
track the "second" feature to avoid false failures when running on a CPU
with only one of IBT or SHSTK.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/x86/msrs_test.c | 22 ++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
index 7c6d846e42dd..91dc66bfdac2 100644
--- a/tools/testing/selftests/kvm/x86/msrs_test.c
+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
@@ -437,12 +437,32 @@ static void test_msrs(void)
}
for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) {
- if (msrs[idx].is_kvm_defined) {
+ struct kvm_msr *msr = &msrs[idx];
+
+ if (msr->is_kvm_defined) {
for (i = 0; i < NR_VCPUS; i++)
host_test_kvm_reg(vcpus[i]);
continue;
}
+ /*
+ * Verify KVM_GET_SUPPORTED_CPUID and KVM_GET_MSR_INDEX_LIST
+ * are consistent with respect to MSRs whose existence is
+ * enumerated via CPUID. Note, using LM as a dummy feature
+ * is a-ok here as well, as all MSRs that abuse LM should be
+ * unconditionally reported in the save/restore list (and
+ * selftests are 64-bit only). Note #2, skip the check for
+ * FS/GS.base MSRs, as they aren't reported in the save/restore
+ * list since their state is managed via SREGS.
+ */
+ TEST_ASSERT(msr->index == MSR_FS_BASE || msr->index == MSR_GS_BASE ||
+ kvm_msr_is_in_save_restore_list(msr->index) ==
+ (kvm_cpu_has(msr->feature) || kvm_cpu_has(msr->feature2)),
+ "%s %s save/restore list, but %s according to CPUID", msr->name,
+ kvm_msr_is_in_save_restore_list(msr->index) ? "is" : "isn't",
+ (kvm_cpu_has(msr->feature) || kvm_cpu_has(msr->feature2)) ?
+ "supported" : "unsupported");
+
sync_global_to_guest(vm, idx);
vcpus_run(vcpus, NR_VCPUS);
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* [PATCH v16 51/51] KVM: VMX: Make CR4.CET a guest owned bit
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (49 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 50/51] KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported Sean Christopherson
@ 2025-09-19 22:32 ` Sean Christopherson
2025-09-22 8:34 ` Binbin Wu
2025-09-24 14:32 ` [PATCH v16 00/51] KVM: x86: Super Mega CET Chao Gao
2025-09-24 18:07 ` Sean Christopherson
52 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-19 22:32 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
From: Mathias Krause <minipli@grsecurity.net>
Make CR4.CET a guest-owned bit under VMX by extending
KVM_POSSIBLE_CR4_GUEST_BITS accordingly.
There's no need to intercept changes to CR4.CET, as it's neither
included in KVM's MMU role bits, nor does KVM specifically care about
the actual value of a (nested) guest's CR4.CET value, beside for
enforcing architectural constraints, i.e. make sure that CR0.WP=1 if
CR4.CET=1.
Intercepting writes to CR4.CET is particularly bad for grsecurity
kernels with KERNEXEC or, even worse, KERNSEAL enabled. These features
heavily make use of read-only kernel objects and use a cpu-local CR0.WP
toggle to override it, when needed. Under a CET-enabled kernel, this
also requires toggling CR4.CET, hence the motivation to make it
guest-owned.
Using the old test from [1] gives the following runtime numbers (perf
stat -r 5 ssdd 10 50000):
* grsec guest on linux-6.16-rc5 + cet patches:
2.4647 +- 0.0706 seconds time elapsed ( +- 2.86% )
* grsec guest on linux-6.16-rc5 + cet patches + CR4.CET guest-owned:
1.5648 +- 0.0240 seconds time elapsed ( +- 1.53% )
Not only does not intercepting CR4.CET make the test run ~35% faster,
it's also more stable with less fluctuation due to fewer VMEXITs.
Therefore, make CR4.CET a guest-owned bit where possible.
This change is VMX-specific, as SVM has no such fine-grained control
register intercept control.
If KVM's assumptions regarding MMU role handling wrt. a guest's CR4.CET
value ever change, the BUILD_BUG_ON()s related to KVM_MMU_CR4_ROLE_BITS
and KVM_POSSIBLE_CR4_GUEST_BITS will catch that early.
Link: https://lore.kernel.org/kvm/20230322013731.102955-1-minipli@grsecurity.net/ [1]
Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/kvm_cache_regs.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 36a8786db291..8ddb01191d6f 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -7,7 +7,8 @@
#define KVM_POSSIBLE_CR0_GUEST_BITS (X86_CR0_TS | X86_CR0_WP)
#define KVM_POSSIBLE_CR4_GUEST_BITS \
(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR \
- | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE)
+ | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE \
+ | X86_CR4_CET)
#define X86_CR0_PDPTR_BITS (X86_CR0_CD | X86_CR0_NW | X86_CR0_PG)
#define X86_CR4_TLBFLUSH_BITS (X86_CR4_PGE | X86_CR4_PCIDE | X86_CR4_PAE | X86_CR4_SMEP)
--
2.51.0.470.ga7dc726c21-goog
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v16 09/51] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
2025-09-19 22:32 ` [PATCH v16 09/51] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Sean Christopherson
@ 2025-09-22 2:10 ` Binbin Wu
2025-09-22 16:41 ` Sean Christopherson
0 siblings, 1 reply; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 2:10 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
[...]
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3e66d8c5000a..ae402463f991 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -136,6 +136,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
> static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
>
> static DEFINE_MUTEX(vendor_module_lock);
> +static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
> +static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
> +
> struct kvm_x86_ops kvm_x86_ops __read_mostly;
>
> #define KVM_X86_OP(func) \
> @@ -3801,6 +3804,67 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
> mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
> }
>
> +/*
> + * Returns true if the MSR in question is managed via XSTATE, i.e. is context
> + * switched with the rest of guest FPU state. Note! S_CET is _not_ context
> + * switched via XSTATE even though it _is_ saved/restored via XSAVES/XRSTORS.
> + * Because S_CET is loaded on VM-Enter and VM-Exit via dedicated VMCS fields,
> + * the value saved/restored via XSTATE is always the host's value. That detail
> + * is _extremely_ important, as the guest's S_CET must _never_ be resident in
> + * hardware while executing in the host. Loading guest values for U_CET and
> + * PL[0-3]_SSP while executing in the kernel is safe, as U_CET is specific to
> + * userspace, and PL[0-3]_SSP are only consumed when transitioning to lower
> + * privilegel levels, i.e. are effectively only consumed by userspace as well.
s/privilegel/privilege[...]
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 13/51] KVM: x86: Enable guest SSP read/write interface with new uAPIs
2025-09-19 22:32 ` [PATCH v16 13/51] KVM: x86: Enable guest SSP read/write interface with new uAPIs Sean Christopherson
@ 2025-09-22 2:58 ` Binbin Wu
2025-09-23 9:06 ` Xiaoyao Li
1 sibling, 0 replies; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 2:58 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Add a KVM-defined ONE_REG register, KVM_REG_GUEST_SSP, to let userspace
> save and restore the guest's Shadow Stack Pointer (SSP). On both Intel
> and AMD, SSP is a hardware register that can only be accessed by software
> via dedicated ISA (e.g. RDSSP) or via VMCS/VMCB fields (used by hardware
> to context switch SSP at entry/exit). As a result, SSP doesn't fit in
> any of KVM's existing interfaces for saving/restoring state.
>
> Internally, treat SSP as a fake/synthetic MSR, as the semantics of writes
> to SSP follow that of several other Shadow Stack MSRs, e.g. the PLx_SSP
> MSRs. Use a translation layer to hide the KVM-internal MSR index so that
> the arbitrary index doesn't become ABI, e.g. so that KVM can rework its
> implementation as needed, so long as the ONE_REG ABI is maintained.
>
> Explicitly reject accesses to SSP if the vCPU doesn't have Shadow Stack
> support to avoid running afoul of ignore_msrs, which unfortunately applies
> to host-initiated accesses (which is a discussion for another day). I.e.
> ensure consistent behavior for KVM-defined registers irrespective of
> ignore_msrs.
>
> Link: https://lore.kernel.org/all/aca9d389-f11e-4811-90cf-d98e345a5cc2@intel.com
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Co-developed-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 17/51] KVM: VMX: Set host constant supervisor states to VMCS fields
2025-09-19 22:32 ` [PATCH v16 17/51] KVM: VMX: Set host constant supervisor states to VMCS fields Sean Christopherson
@ 2025-09-22 3:03 ` Binbin Wu
0 siblings, 0 replies; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 3:03 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Save constant values to HOST_{S_CET,SSP,INTR_SSP_TABLE} field explicitly.
> Kernel IBT is supported and the setting in MSR_IA32_S_CET is static after
> post-boot(The exception is BIOS call case but vCPU thread never across it)
> and KVM doesn't need to refresh HOST_S_CET field before every VM-Enter/
> VM-Exit sequence.
>
> Host supervisor shadow stack is not enabled now and SSP is not accessible
> to kernel mode, thus it's safe to set host IA32_INT_SSP_TAB/SSP VMCS field
> to 0s. When shadow stack is enabled for CPL3, SSP is reloaded from PL3_SSP
> before it exits to userspace. Check SDM Vol 2A/B Chapter 3/4 for SYSCALL/
> SYSRET/SYSENTER SYSEXIT/RDSSP/CALL etc.
>
> Prevent KVM module loading if host supervisor shadow stack SHSTK_EN is set
> in MSR_IA32_S_CET as KVM cannot co-exit with it correctly.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Suggested-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> [sean: snapshot host S_CET if SHSTK *or* IBT is supported]
> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 18/51] KVM: x86: Don't emulate instructions affected by CET features
2025-09-19 22:32 ` [PATCH v16 18/51] KVM: x86: Don't emulate instructions affected by CET features Sean Christopherson
@ 2025-09-22 5:39 ` Binbin Wu
2025-09-22 16:47 ` Sean Christopherson
2025-09-22 10:27 ` Chao Gao
1 sibling, 1 reply; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 5:39 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> Don't emulate branch instructions, e.g. CALL/RET/JMP etc., that are
> affected by Shadow Stacks and/or Indirect Branch Tracking when said
> features are enabled in the guest, as fully emulating CET would require
> significant complexity for no practical benefit (KVM shouldn't need to
> emulate branch instructions on modern hosts). Simply doing nothing isn't
> an option as that would allow a malicious entity to subvert CET
> protections via the emulator.
>
> To detect instructions that are subject to IBT or affect IBT state, use
> the existing IsBranch flag along with the source operand type to detect
> indirect branches, and the existing NearBranch flag to detect far branches
> (which can affect IBT state even if the branch itself is direct).
>
> For Shadow Stacks, explicitly track instructions that directly affect the
> current SSP, as KVM's emulator doesn't have existing flags that can be
> used to precisely detect such instructions. Alternatively, the em_xxx()
> helpers could directly check for ShadowStack interactions, but using a
> dedicated flag is arguably easier to audit, and allows for handling both
> IBT and SHSTK in one fell swoop.
>
> Note! On far transfers, do NOT consult the current privilege level and
> instead treat SHSTK/IBT as being enabled if they're enabled for User *or*
> Supervisor mode. On inter-privilege level far transfers, SHSTK and IBT
> can be in play for the target privilege level, i.e. checking the current
> privilege could get a false negative, and KVM doesn't know the target
> privilege level until emulation gets under way.
>
> Note #2, FAR JMP from 64-bit mode to compatibility mode interacts with
> the current SSP, but only to ensure SSP[63:32] == 0. Don't tag FAR JMP
> as SHSTK, which would be rather confusing and would result in FAR JMP
> being rejected unnecessarily the vast majority of the time (ignoring that
> it's unlikely to ever be emulated). A future commit will add the #GP(0)
> check for the specific FAR JMP scenario.
>
> Note #3, task switches also modify SSP and so need to be rejected. That
> too will be addressed in a future commit.
>
> Suggested-by: Chao Gao <chao.gao@intel.com>
> Originally-by: Yang Weijiang <weijiang.yang@intel.com>
> Cc: Mathias Krause <minipli@grsecurity.net>
> Cc: John Allen <john.allen@amd.com>
> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Two nits below.
> ---
> arch/x86/kvm/emulate.c | 114 ++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 100 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 23929151a5b8..dc0249929cbf 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -178,6 +178,7 @@
> #define IncSP ((u64)1 << 54) /* SP is incremented before ModRM calc */
> #define TwoMemOp ((u64)1 << 55) /* Instruction has two memory operand */
> #define IsBranch ((u64)1 << 56) /* Instruction is considered a branch. */
> +#define ShadowStack ((u64)1 << 57) /* Instruction affects Shadow Stacks. */
>
> #define DstXacc (DstAccLo | SrcAccHi | SrcWrite)
>
> @@ -660,6 +661,57 @@ static inline bool emul_is_noncanonical_address(u64 la,
> return !ctxt->ops->is_canonical_addr(ctxt, la, flags);
> }
>
> +static bool is_shstk_instruction(u64 flags)
> +{
> + return flags & ShadowStack;
> +}
> +
> +static bool is_ibt_instruction(u64 flags)
> +{
> + if (!(flags & IsBranch))
> + return false;
> +
> + /*
> + * Far transfers can affect IBT state even if the branch itself is
> + * direct, e.g. when changing privilege levels and loading a conforming
> + * code segment. For simplicity, treat all far branches as affecting
> + * IBT. False positives are acceptable (emulating far branches on an
> + * IBT-capable CPU won't happen in practice), while false negatives
> + * could impact guest security.
> + *
> + * Note, this also handles SYCALL and SYSENTER.
SYCALL -> SYSCALL
> + */
> + if (!(flags & NearBranch))
> + return true;
> +
> + switch (flags & (OpMask << SrcShift)) {
> + case SrcReg:
> + case SrcMem:
> + case SrcMem16:
> + case SrcMem32:
> + return true;
> + case SrcMemFAddr:
> + case SrcImmFAddr:
> + /* Far branches should be handled above. */
> + WARN_ON_ONCE(1);
> + return true;
> + case SrcNone:
> + case SrcImm:
> + case SrcImmByte:
> + /*
> + * Note, ImmU16 is used only for the stack adjustment operand on ENTER
> + * and RET instructions. ENTER isn't a branch and RET FAR is handled
> + * by the NearBranch check above. RET itself isn't an indirect branch.
> + */
> + case SrcImmU16:
> + return false;
> + default:
> + WARN_ONCE(1, "Unexpected Src operand '%llx' on branch",
> + (flags & (OpMask << SrcShift)));
> + return false;
Is it safer to reject the emulation if it has unexpected src operand?
> + }
> +}
> +
>
[...]
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 19/51] KVM: x86: Don't emulate task switches when IBT or SHSTK is enabled
2025-09-19 22:32 ` [PATCH v16 19/51] KVM: x86: Don't emulate task switches when IBT or SHSTK is enabled Sean Christopherson
@ 2025-09-22 6:41 ` Binbin Wu
2025-09-22 17:23 ` Sean Christopherson
2025-09-22 11:27 ` Chao Gao
1 sibling, 1 reply; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 6:41 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if the guest triggers
> task switch emulation with Indirect Branch Tracking or Shadow Stacks
> enabled,
The code just does it when shadow stack is enabled.
> as attempting to do the right thing would require non-trivial
> effort and complexity, KVM doesn't support emulating CET generally, and
> it's extremely unlikely that any guest will do task switches while also
> utilizing CET. Defer taking on the complexity until someone cares enough
> to put in the time and effort to add support.
>
> Per the SDM:
>
> If shadow stack is enabled, then the SSP of the task is located at the
> 4 bytes at offset 104 in the 32-bit TSS and is used by the processor to
> establish the SSP when a task switch occurs from a task associated with
> this TSS. Note that the processor does not write the SSP of the task
> initiating the task switch to the TSS of that task, and instead the SSP
> of the previous task is pushed onto the shadow stack of the new task.
>
> Note, per the SDM's pseudocode on TASK SWITCHING, IBT state for the new
> privilege level is updated. To keep things simple, check both S_CET and
> U_CET (again, anyone that wants more precise checking can have the honor
> of implementing support).
>
> Reported-by: Binbin Wu <binbin.wu@linux.intel.com>
> Closes: https://lore.kernel.org/all/819bd98b-2a60-4107-8e13-41f1e4c706b1@linux.intel.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/x86.c | 35 ++++++++++++++++++++++++++++-------
> 1 file changed, 28 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d2cccc7594d4..0c060e506f9d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12178,6 +12178,25 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
> struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
> int ret;
>
> + if (kvm_is_cr4_bit_set(vcpu, X86_CR4_CET)) {
> + u64 u_cet, s_cet;
> +
> + /*
> + * Check both User and Supervisor on task switches as inter-
> + * privilege level task switches are impacted by CET at both
> + * the current privilege level and the new privilege level, and
> + * that information is not known at this time. The expectation
> + * is that the guest won't require emulation of task switches
> + * while using IBT or Shadow Stacks.
> + */
> + if (__kvm_emulate_msr_read(vcpu, MSR_IA32_U_CET, &u_cet) ||
> + __kvm_emulate_msr_read(vcpu, MSR_IA32_S_CET, &s_cet))
> + return EMULATION_FAILED;
> +
> + if ((u_cet | s_cet) & CET_SHSTK_EN)
> + goto unhandled_task_switch;
> + }
> +
> init_emulate_ctxt(vcpu);
>
> ret = emulator_task_switch(ctxt, tss_selector, idt_index, reason,
> @@ -12187,17 +12206,19 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
> * Report an error userspace if MMIO is needed, as KVM doesn't support
> * MMIO during a task switch (or any other complex operation).
> */
> - if (ret || vcpu->mmio_needed) {
> - vcpu->mmio_needed = false;
> - vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> - vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
> - vcpu->run->internal.ndata = 0;
> - return 0;
> - }
> + if (ret || vcpu->mmio_needed)
> + goto unhandled_task_switch;
>
> kvm_rip_write(vcpu, ctxt->eip);
> kvm_set_rflags(vcpu, ctxt->eflags);
> return 1;
> +
> +unhandled_task_switch:
> + vcpu->mmio_needed = false;
> + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
> + vcpu->run->internal.ndata = 0;
> + return 0;
> }
> EXPORT_SYMBOL_GPL(kvm_task_switch);
>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 20/51] KVM: x86: Emulate SSP[63:32]!=0 #GP(0) for FAR JMP to 32-bit mode
2025-09-19 22:32 ` [PATCH v16 20/51] KVM: x86: Emulate SSP[63:32]!=0 #GP(0) for FAR JMP to 32-bit mode Sean Christopherson
@ 2025-09-22 7:15 ` Binbin Wu
2025-09-23 14:29 ` Xiaoyao Li
1 sibling, 0 replies; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 7:15 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> Emulate the Shadow Stack restriction that the current SSP must be a 32-bit
> value on a FAR JMP from 64-bit mode to compatibility mode. From the SDM's
> pseudocode for FAR JMP:
>
> IF ShadowStackEnabled(CPL)
> IF (IA32_EFER.LMA and DEST(segment selector).L) = 0
> (* If target is legacy or compatibility mode then the SSP must be in low 4GB *)
> IF (SSP & 0xFFFFFFFF00000000 != 0); THEN
> #GP(0);
> FI;
> FI;
> FI;
>
> Note, only the current CPL needs to be considered, as FAR JMP can't be
> used for inter-privilege level transfers, and KVM rejects emulation of all
> other far branch instructions when Shadow Stacks are enabled.
>
> To give the emulator access to GUEST_SSP, special case handling
> MSR_KVM_INTERNAL_GUEST_SSP in emulator_get_msr() to treat the access as a
> host access (KVM doesn't allow guest accesses to internal "MSRs"). The
> ->get_msr() API is only used for implicit accesses from the emulator, i.e.
> is only used with hardcoded MSR indices, and so any access to
> MSR_KVM_INTERNAL_GUEST_SSP is guaranteed to be from KVM, i.e. not from the
> guest via RDMSR.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/kvm/emulate.c | 35 +++++++++++++++++++++++++++++++++++
> arch/x86/kvm/x86.c | 9 +++++++++
> 2 files changed, 44 insertions(+)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index dc0249929cbf..5c5fb6a6f7f9 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -1605,6 +1605,37 @@ static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt,
> return linear_write_system(ctxt, addr, desc, sizeof(*desc));
> }
>
> +static bool emulator_is_ssp_invalid(struct x86_emulate_ctxt *ctxt, u8 cpl)
> +{
> + const u32 MSR_IA32_X_CET = cpl == 3 ? MSR_IA32_U_CET : MSR_IA32_S_CET;
> + u64 efer = 0, cet = 0, ssp = 0;
> +
> + if (!(ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET))
> + return false;
> +
> + if (ctxt->ops->get_msr(ctxt, MSR_EFER, &efer))
> + return true;
> +
> + /* SSP is guaranteed to be valid if the vCPU was already in 32-bit mode. */
> + if (!(efer & EFER_LMA))
> + return false;
> +
> + if (ctxt->ops->get_msr(ctxt, MSR_IA32_X_CET, &cet))
> + return true;
> +
> + if (!(cet & CET_SHSTK_EN))
> + return false;
> +
> + if (ctxt->ops->get_msr(ctxt, MSR_KVM_INTERNAL_GUEST_SSP, &ssp))
> + return true;
> +
> + /*
> + * On transfer from 64-bit mode to compatibility mode, SSP[63:32] must
> + * be 0, i.e. SSP must be a 32-bit value outside of 64-bit mode.
> + */
> + return ssp >> 32;
> +}
> +
> static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
> u16 selector, int seg, u8 cpl,
> enum x86_transfer_type transfer,
> @@ -1745,6 +1776,10 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
> if (efer & EFER_LMA)
> goto exception;
> }
> + if (!seg_desc.l && emulator_is_ssp_invalid(ctxt, cpl)) {
> + err_code = 0;
> + goto exception;
> + }
>
> /* CS(RPL) <- CPL */
> selector = (selector & 0xfffc) | cpl;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0c060e506f9d..40596fc5142e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8741,6 +8741,15 @@ static int emulator_set_msr_with_filter(struct x86_emulate_ctxt *ctxt,
> static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
> u32 msr_index, u64 *pdata)
> {
> + /*
> + * Treat emulator accesses to the current shadow stack pointer as host-
> + * initiated, as they aren't true MSR accesses (SSP is a "just a reg"),
> + * and this API is used only for implicit accesses, i.e. not RDMSR, and
> + * so the index is fully KVM-controlled.
> + */
> + if (unlikely(msr_index == MSR_KVM_INTERNAL_GUEST_SSP))
> + return kvm_msr_read(emul_to_vcpu(ctxt), msr_index, pdata);
> +
> return __kvm_emulate_msr_read(emul_to_vcpu(ctxt), msr_index, pdata);
> }
>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 21/51] KVM: x86/mmu: WARN on attempt to check permissions for Shadow Stack #PF
2025-09-19 22:32 ` [PATCH v16 21/51] KVM: x86/mmu: WARN on attempt to check permissions for Shadow Stack #PF Sean Christopherson
@ 2025-09-22 7:17 ` Binbin Wu
2025-09-22 7:46 ` Binbin Wu
0 siblings, 1 reply; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 7:17 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> Add PFERR_SS_MASK, a.k.a. Shadow Stack access, and WARN if KVM attempts to
> check permissions for a Shadow Stack access as KVM hasn't been taught to
> understand the magic Writable=0,Dirty=0 combination that is required for
> Shadow Stack accesses, and likely will never learn. There are no plans to
> support Shadow Stacks with the Shadow MMU, and the emulator rejects all
> instructions that affect Shadow Stacks, i.e. it should be impossible for
> KVM to observe a #PF due to a shadow stack access.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/mmu.h | 2 +-
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 7a7e6356a8dd..554d83ff6135 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -267,6 +267,7 @@ enum x86_intercept_stage;
> #define PFERR_RSVD_MASK BIT(3)
> #define PFERR_FETCH_MASK BIT(4)
> #define PFERR_PK_MASK BIT(5)
> +#define PFERR_SS_MASK BIT(6)
> #define PFERR_SGX_MASK BIT(15)
> #define PFERR_GUEST_RMP_MASK BIT_ULL(31)
> #define PFERR_GUEST_FINAL_MASK BIT_ULL(32)
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index b4b6860ab971..f63074048ec6 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -212,7 +212,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>
> fault = (mmu->permissions[index] >> pte_access) & 1;
>
> - WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));
> + WARN_ON_ONCE(pfec & (PFERR_PK_MASK | PFERR_SS_MASK | PFERR_RSVD_MASK));
> if (unlikely(mmu->pkru_mask)) {
> u32 pkru_bits, offset;
>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 22/51] KVM: x86/mmu: Pretty print PK, SS, and SGX flags in MMU tracepoints
2025-09-19 22:32 ` [PATCH v16 22/51] KVM: x86/mmu: Pretty print PK, SS, and SGX flags in MMU tracepoints Sean Christopherson
@ 2025-09-22 7:18 ` Binbin Wu
2025-09-22 16:18 ` Sean Christopherson
2025-09-23 14:46 ` Xiaoyao Li
1 sibling, 1 reply; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 7:18 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> Add PK (Protection Keys), SS (Shadow Stacks), and SGX (Software Guard
> Extensions) to the set of #PF error flags handled via
> kvm_mmu_trace_pferr_flags. While KVM doesn't expect PK or SS #PFs
Also SGX.
> in
> particular, pretty print their names instead of the raw hex value saves
> the user from having to go spelunking in the SDM to figure out what's
> going on.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/kvm/mmu/mmutrace.h | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index f35a830ce469..764e3015d021 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -51,6 +51,9 @@
> { PFERR_PRESENT_MASK, "P" }, \
> { PFERR_WRITE_MASK, "W" }, \
> { PFERR_USER_MASK, "U" }, \
> + { PFERR_PK_MASK, "PK" }, \
> + { PFERR_SS_MASK, "SS" }, \
> + { PFERR_SGX_MASK, "SGX" }, \
> { PFERR_RSVD_MASK, "RSVD" }, \
> { PFERR_FETCH_MASK, "F" }
>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 23/51] KVM: x86: Allow setting CR4.CET if IBT or SHSTK is supported
2025-09-19 22:32 ` [PATCH v16 23/51] KVM: x86: Allow setting CR4.CET if IBT or SHSTK is supported Sean Christopherson
@ 2025-09-22 7:25 ` Binbin Wu
2025-09-23 14:46 ` Xiaoyao Li
1 sibling, 0 replies; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 7:25 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Drop X86_CR4_CET from CR4_RESERVED_BITS and instead mark CET as reserved
> if and only if IBT *and* SHSTK are unsupported, i.e. allow CR4.CET to be
> set if IBT or SHSTK is supported. This creates a virtualization hole if
> the CPU supports both IBT and SHSTK, but the kernel or vCPU model only
> supports one of the features. However, it's entirely legal for a CPU to
> have only one of IBT or SHSTK, i.e. the hole is a flaw in the architecture,
> not in KVM.
>
> More importantly, so long as KVM is careful to initialize and context
> switch both IBT and SHSTK state (when supported in hardware) if either
> feature is exposed to the guest, a misbehaving guest can only harm itself.
> E.g. VMX initializes host CET VMCS fields based solely on hardware
> capabilities.
>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> [sean: split to separate patch, write changelog]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/x86.h | 3 +++
> 2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 554d83ff6135..39231da3a3ff 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -142,7 +142,7 @@
> | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
> | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
> | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
> - | X86_CR4_LAM_SUP))
> + | X86_CR4_LAM_SUP | X86_CR4_CET))
>
> #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
>
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 65cbd454c4f1..f3dc77f006f9 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -680,6 +680,9 @@ static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> __reserved_bits |= X86_CR4_PCIDE; \
> if (!__cpu_has(__c, X86_FEATURE_LAM)) \
> __reserved_bits |= X86_CR4_LAM_SUP; \
> + if (!__cpu_has(__c, X86_FEATURE_SHSTK) && \
> + !__cpu_has(__c, X86_FEATURE_IBT)) \
> + __reserved_bits |= X86_CR4_CET; \
> __reserved_bits; \
> })
>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 25/51] KVM: x86: Add XSS support for CET_KERNEL and CET_USER
2025-09-19 22:32 ` [PATCH v16 25/51] KVM: x86: Add XSS support for CET_KERNEL and CET_USER Sean Christopherson
@ 2025-09-22 7:31 ` Binbin Wu
2025-09-23 14:55 ` Xiaoyao Li
1 sibling, 0 replies; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 7:31 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Add CET_KERNEL and CET_USER to KVM's set of supported XSS bits when IBT
> *or* SHSTK is supported. Like CR4.CET, XFEATURE support for IBT and SHSTK
> are bundle together under the CET umbrella, and thus prone to
> virtualization holes if KVM or the guest supports only one of IBT or SHSTK,
> but hardware supports both. However, again like CR4.CET, such
> virtualization holes are benign from the host's perspective so long as KVM
> takes care to always honor the "or" logic.
>
> Require CET_KERNEL and CET_USER to come as a pair, and refuse to support
> IBT or SHSTK if one (or both) features is missing, as the (host) kernel
> expects them to come as a pair, i.e. may get confused and corrupt state if
> only one of CET_KERNEL or CET_USER is supported.
>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> [sean: split to separate patch, write changelog, add XFEATURE_MASK_CET_ALL]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/kvm/x86.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 40596fc5142e..4a0ff0403bb2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -220,13 +220,14 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
> | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
> | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
>
> +#define XFEATURE_MASK_CET_ALL (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL)
> /*
> * Note, KVM supports exposing PT to the guest, but does not support context
> * switching PT via XSTATE (KVM's PT virtualization relies on perf; swapping
> * PT via guest XSTATE would clobber perf state), i.e. KVM doesn't support
> * IA32_XSS[bit 8] (guests can/must use RDMSR/WRMSR to save/restore PT MSRs).
> */
> -#define KVM_SUPPORTED_XSS 0
> +#define KVM_SUPPORTED_XSS (XFEATURE_MASK_CET_ALL)
>
> bool __read_mostly allow_smaller_maxphyaddr = 0;
> EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);
> @@ -10104,6 +10105,16 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
> if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
> kvm_caps.supported_xss = 0;
>
> + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> + !kvm_cpu_cap_has(X86_FEATURE_IBT))
> + kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_ALL;
> +
> + if ((kvm_caps.supported_xss & XFEATURE_MASK_CET_ALL) != XFEATURE_MASK_CET_ALL) {
> + kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
> + kvm_cpu_cap_clear(X86_FEATURE_IBT);
> + kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_ALL;
> + }
> +
> if (kvm_caps.has_tsc_control) {
> /*
> * Make sure the user can only configure tsc_khz values that
> @@ -12775,10 +12786,11 @@ static void kvm_xstate_reset(struct kvm_vcpu *vcpu, bool init_event)
> /*
> * On INIT, only select XSTATE components are zeroed, most components
> * are unchanged. Currently, the only components that are zeroed and
> - * supported by KVM are MPX related.
> + * supported by KVM are MPX and CET related.
> */
> xfeatures_mask = (kvm_caps.supported_xcr0 | kvm_caps.supported_xss) &
> - (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
> + (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR |
> + XFEATURE_MASK_CET_ALL);
> if (!xfeatures_mask)
> return;
>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 26/51] KVM: x86: Disable support for Shadow Stacks if TDP is disabled
2025-09-19 22:32 ` [PATCH v16 26/51] KVM: x86: Disable support for Shadow Stacks if TDP is disabled Sean Christopherson
@ 2025-09-22 7:45 ` Binbin Wu
2025-09-23 14:56 ` Xiaoyao Li
1 sibling, 0 replies; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 7:45 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> Make TDP a hard requirement for Shadow Stacks, as there are no plans to
> add Shadow Stack support to the Shadow MMU. E.g. KVM hasn't been taught
> to understand the magic Writable=0,Dirty=0 combination that is required
Writable=0,Dirty=0 -> Writable=0,Dirty=1
Otherwise,
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> for Shadow Stack accesses, and so enabling Shadow Stacks when using
> shadow paging will put the guest into an infinite #PF loop (KVM thinks the
> shadow page tables have a valid mapping, hardware says otherwise).
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/cpuid.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 32fde9e80c28..499c86bd457e 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -955,6 +955,14 @@ void kvm_set_cpu_caps(void)
> if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
> kvm_cpu_cap_clear(X86_FEATURE_PKU);
>
> + /*
> + * Shadow Stacks aren't implemented in the Shadow MMU. Shadow Stack
> + * accesses require "magic" Writable=0,Dirty=1 protection, which KVM
> + * doesn't know how to emulate or map.
> + */
> + if (!tdp_enabled)
> + kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
> +
> kvm_cpu_cap_init(CPUID_7_EDX,
> F(AVX512_4VNNIW),
> F(AVX512_4FMAPS),
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 21/51] KVM: x86/mmu: WARN on attempt to check permissions for Shadow Stack #PF
2025-09-22 7:17 ` Binbin Wu
@ 2025-09-22 7:46 ` Binbin Wu
2025-09-23 14:33 ` Xiaoyao Li
0 siblings, 1 reply; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 7:46 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/22/2025 3:17 PM, Binbin Wu wrote:
>
>
> On 9/20/2025 6:32 AM, Sean Christopherson wrote:
>> Add PFERR_SS_MASK, a.k.a. Shadow Stack access, and WARN if KVM attempts to
>> check permissions for a Shadow Stack access as KVM hasn't been taught to
>> understand the magic Writable=0,Dirty=0 combination that is required for
Typo:
Writable=0,Dirty=0 -> Writable=0,Dirty=1
>> Shadow Stack accesses, and likely will never learn. There are no plans to
>> support Shadow Stacks with the Shadow MMU, and the emulator rejects all
>> instructions that affect Shadow Stacks, i.e. it should be impossible for
>> KVM to observe a #PF due to a shadow stack access.
>>
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
>
>> ---
>> arch/x86/include/asm/kvm_host.h | 1 +
>> arch/x86/kvm/mmu.h | 2 +-
>> 2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 7a7e6356a8dd..554d83ff6135 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -267,6 +267,7 @@ enum x86_intercept_stage;
>> #define PFERR_RSVD_MASK BIT(3)
>> #define PFERR_FETCH_MASK BIT(4)
>> #define PFERR_PK_MASK BIT(5)
>> +#define PFERR_SS_MASK BIT(6)
>> #define PFERR_SGX_MASK BIT(15)
>> #define PFERR_GUEST_RMP_MASK BIT_ULL(31)
>> #define PFERR_GUEST_FINAL_MASK BIT_ULL(32)
>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>> index b4b6860ab971..f63074048ec6 100644
>> --- a/arch/x86/kvm/mmu.h
>> +++ b/arch/x86/kvm/mmu.h
>> @@ -212,7 +212,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>> fault = (mmu->permissions[index] >> pte_access) & 1;
>> - WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));
>> + WARN_ON_ONCE(pfec & (PFERR_PK_MASK | PFERR_SS_MASK | PFERR_RSVD_MASK));
>> if (unlikely(mmu->pkru_mask)) {
>> u32 pkru_bits, offset;
>
>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 27/51] KVM: x86: Disable support for IBT and SHSTK if allow_smaller_maxphyaddr is true
2025-09-19 22:32 ` [PATCH v16 27/51] KVM: x86: Disable support for IBT and SHSTK if allow_smaller_maxphyaddr is true Sean Christopherson
@ 2025-09-22 8:00 ` Binbin Wu
2025-09-22 18:40 ` Sean Christopherson
2025-09-23 14:44 ` Xiaoyao Li
2 siblings, 0 replies; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 8:00 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> Make IBT and SHSTK virtualization mutually exclusive with "officially"
> supporting setups with guest.MAXPHYADDR < host.MAXPHYADDR, i.e. if the
> allow_smaller_maxphyaddr module param is set. Running a guest with a
> smaller MAXPHYADDR requires intercepting #PF, and can also trigger
> emulation of arbitrary instructions. Intercepting and reacting to #PFs
> doesn't play nice with SHSTK, as KVM's MMU hasn't been taught to handle
> Shadow Stack accesses, and emulating arbitrary instructions doesn't play
> nice with IBT or SHSTK, as KVM's emulator doesn't handle the various side
> effects, e.g. doesn't enforce end-branch markers or model Shadow Stack
> updates.
>
> Note, hiding IBT and SHSTK based solely on allow_smaller_maxphyaddr is
> overkill, as allow_smaller_maxphyaddr is only problematic if the guest is
> actually configured to have a smaller MAXPHYADDR. However, KVM's ABI
> doesn't provide a way to express that IBT and SHSTK may break if enabled
> in conjunction with guest.MAXPHYADDR < host.MAXPHYADDR. I.e. the
> alternative is to do nothing in KVM and instead update documentation and
> hope KVM users are thorough readers. Go with the conservative-but-correct
> approach; worst case scenario, this restriction can be dropped if there's
> a strong use case for enabling CET on hosts with allow_smaller_maxphyaddr.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/kvm/cpuid.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 499c86bd457e..b5c4cb13630c 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -963,6 +963,16 @@ void kvm_set_cpu_caps(void)
> if (!tdp_enabled)
> kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
>
> + /*
> + * Disable support for IBT and SHSTK if KVM is configured to emulate
> + * accesses to reserved GPAs, as KVM's emulator doesn't support IBT or
> + * SHSTK, nor does KVM handle Shadow Stack #PFs (see above).
> + */
> + if (allow_smaller_maxphyaddr) {
> + kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
> + kvm_cpu_cap_clear(X86_FEATURE_IBT);
> + }
> +
> kvm_cpu_cap_init(CPUID_7_EDX,
> F(AVX512_4VNNIW),
> F(AVX512_4FMAPS),
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 28/51] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
2025-09-19 22:32 ` [PATCH v16 28/51] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Sean Christopherson
@ 2025-09-22 8:06 ` Binbin Wu
2025-09-23 14:57 ` Xiaoyao Li
1 sibling, 0 replies; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 8:06 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Add support for the LOAD_CET_STATE VM-Enter and VM-Exit controls, the
> CET XFEATURE bits in XSS, and advertise support for IBT and SHSTK to
> userspace. Explicitly clear IBT and SHSTK onn SVM, as additional work is
> needed to enable CET on SVM, e.g. to context switch S_CET and other state.
>
> Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
> KVM does not support emulating CET, as running without Unrestricted Guest
> can result in KVM emulating large swaths of guest code. While it's highly
> unlikely any guest will trigger emulation while also utilizing IBT or
> SHSTK, there's zero reason to allow CET without Unrestricted Guest as that
> combination should only be possible when explicitly disabling
> unrestricted_guest for testing purposes.
>
> Disable CET if VMX_BASIC[bit56] == 0, i.e. if hardware strictly enforces
> the presence of an Error Code based on exception vector, as attempting to
> inject a #CP with an Error Code (#CP architecturally has an Error Code)
> will fail due to the #CP vector historically not having an Error Code.
>
> Clear S_CET and SSP-related VMCS on "reset" to emulate the architectural
> of CET MSRs and SSP being reset to 0 after RESET, power-up and INIT. Note,
> KVM already clears guest CET state that is managed via XSTATE in
> kvm_xstate_reset().
>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> [sean: move some bits to separate patches, massage changelog]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/include/asm/vmx.h | 1 +
> arch/x86/kvm/cpuid.c | 2 ++
> arch/x86/kvm/svm/svm.c | 4 ++++
> arch/x86/kvm/vmx/capabilities.h | 5 +++++
> arch/x86/kvm/vmx/vmx.c | 30 +++++++++++++++++++++++++++++-
> arch/x86/kvm/vmx/vmx.h | 6 ++++--
> 6 files changed, 45 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index ce10a7e2d3d9..c85c50019523 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -134,6 +134,7 @@
> #define VMX_BASIC_DUAL_MONITOR_TREATMENT BIT_ULL(49)
> #define VMX_BASIC_INOUT BIT_ULL(54)
> #define VMX_BASIC_TRUE_CTLS BIT_ULL(55)
> +#define VMX_BASIC_NO_HW_ERROR_CODE_CC BIT_ULL(56)
>
> static inline u32 vmx_basic_vmcs_revision_id(u64 vmx_basic)
> {
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index b5c4cb13630c..b861a88083e1 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -946,6 +946,7 @@ void kvm_set_cpu_caps(void)
> VENDOR_F(WAITPKG),
> F(SGX_LC),
> F(BUS_LOCK_DETECT),
> + X86_64_F(SHSTK),
> );
>
> /*
> @@ -990,6 +991,7 @@ void kvm_set_cpu_caps(void)
> F(AMX_INT8),
> F(AMX_BF16),
> F(FLUSH_L1D),
> + F(IBT),
> );
>
> if (boot_cpu_has(X86_FEATURE_AMD_IBPB_RET) &&
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 67f4eed01526..73dde1645e46 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5221,6 +5221,10 @@ static __init void svm_set_cpu_caps(void)
> kvm_caps.supported_perf_cap = 0;
> kvm_caps.supported_xss = 0;
>
> + /* KVM doesn't yet support CET virtualization for SVM. */
> + kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
> + kvm_cpu_cap_clear(X86_FEATURE_IBT);
> +
> /* CPUID 0x80000001 and 0x8000000A (SVM features) */
> if (nested) {
> kvm_cpu_cap_set(X86_FEATURE_SVM);
> diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
> index 59c83888bdc0..02aadb9d730e 100644
> --- a/arch/x86/kvm/vmx/capabilities.h
> +++ b/arch/x86/kvm/vmx/capabilities.h
> @@ -73,6 +73,11 @@ static inline bool cpu_has_vmx_basic_inout(void)
> return vmcs_config.basic & VMX_BASIC_INOUT;
> }
>
> +static inline bool cpu_has_vmx_basic_no_hw_errcode_cc(void)
> +{
> + return vmcs_config.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
> +}
> +
> static inline bool cpu_has_virtual_nmis(void)
> {
> return vmcs_config.pin_based_exec_ctrl & PIN_BASED_VIRTUAL_NMIS &&
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index a7d9e60b2771..69e35440cee7 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2615,6 +2615,7 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
> { VM_ENTRY_LOAD_IA32_EFER, VM_EXIT_LOAD_IA32_EFER },
> { VM_ENTRY_LOAD_BNDCFGS, VM_EXIT_CLEAR_BNDCFGS },
> { VM_ENTRY_LOAD_IA32_RTIT_CTL, VM_EXIT_CLEAR_IA32_RTIT_CTL },
> + { VM_ENTRY_LOAD_CET_STATE, VM_EXIT_LOAD_CET_STATE },
> };
>
> memset(vmcs_conf, 0, sizeof(*vmcs_conf));
> @@ -4881,6 +4882,14 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>
> vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); /* 22.2.1 */
>
> + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
> + vmcs_writel(GUEST_SSP, 0);
> + vmcs_writel(GUEST_INTR_SSP_TABLE, 0);
> + }
> + if (kvm_cpu_cap_has(X86_FEATURE_IBT) ||
> + kvm_cpu_cap_has(X86_FEATURE_SHSTK))
> + vmcs_writel(GUEST_S_CET, 0);
> +
> kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
>
> vpid_sync_context(vmx->vpid);
> @@ -6348,6 +6357,10 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
> if (vmcs_read32(VM_EXIT_MSR_STORE_COUNT) > 0)
> vmx_dump_msrs("guest autostore", &vmx->msr_autostore.guest);
>
> + if (vmentry_ctl & VM_ENTRY_LOAD_CET_STATE)
> + pr_err("S_CET = 0x%016lx, SSP = 0x%016lx, SSP TABLE = 0x%016lx\n",
> + vmcs_readl(GUEST_S_CET), vmcs_readl(GUEST_SSP),
> + vmcs_readl(GUEST_INTR_SSP_TABLE));
> pr_err("*** Host State ***\n");
> pr_err("RIP = 0x%016lx RSP = 0x%016lx\n",
> vmcs_readl(HOST_RIP), vmcs_readl(HOST_RSP));
> @@ -6378,6 +6391,10 @@ void dump_vmcs(struct kvm_vcpu *vcpu)
> vmcs_read64(HOST_IA32_PERF_GLOBAL_CTRL));
> if (vmcs_read32(VM_EXIT_MSR_LOAD_COUNT) > 0)
> vmx_dump_msrs("host autoload", &vmx->msr_autoload.host);
> + if (vmexit_ctl & VM_EXIT_LOAD_CET_STATE)
> + pr_err("S_CET = 0x%016lx, SSP = 0x%016lx, SSP TABLE = 0x%016lx\n",
> + vmcs_readl(HOST_S_CET), vmcs_readl(HOST_SSP),
> + vmcs_readl(HOST_INTR_SSP_TABLE));
>
> pr_err("*** Control State ***\n");
> pr_err("CPUBased=0x%08x SecondaryExec=0x%08x TertiaryExec=0x%016llx\n",
> @@ -7959,7 +7976,6 @@ static __init void vmx_set_cpu_caps(void)
> kvm_cpu_cap_set(X86_FEATURE_UMIP);
>
> /* CPUID 0xD.1 */
> - kvm_caps.supported_xss = 0;
> if (!cpu_has_vmx_xsaves())
> kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
>
> @@ -7971,6 +7987,18 @@ static __init void vmx_set_cpu_caps(void)
>
> if (cpu_has_vmx_waitpkg())
> kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);
> +
> + /*
> + * Disable CET if unrestricted_guest is unsupported as KVM doesn't
> + * enforce CET HW behaviors in emulator. On platforms with
> + * VMX_BASIC[bit56] == 0, inject #CP at VMX entry with error code
> + * fails, so disable CET in this case too.
> + */
> + if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest ||
> + !cpu_has_vmx_basic_no_hw_errcode_cc()) {
> + kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
> + kvm_cpu_cap_clear(X86_FEATURE_IBT);
> + }
> }
>
> static bool vmx_is_io_intercepted(struct kvm_vcpu *vcpu,
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 23d6e89b96f2..af8224e074ee 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -484,7 +484,8 @@ static inline u8 vmx_get_rvi(void)
> VM_ENTRY_LOAD_IA32_EFER | \
> VM_ENTRY_LOAD_BNDCFGS | \
> VM_ENTRY_PT_CONCEAL_PIP | \
> - VM_ENTRY_LOAD_IA32_RTIT_CTL)
> + VM_ENTRY_LOAD_IA32_RTIT_CTL | \
> + VM_ENTRY_LOAD_CET_STATE)
>
> #define __KVM_REQUIRED_VMX_VM_EXIT_CONTROLS \
> (VM_EXIT_SAVE_DEBUG_CONTROLS | \
> @@ -506,7 +507,8 @@ static inline u8 vmx_get_rvi(void)
> VM_EXIT_LOAD_IA32_EFER | \
> VM_EXIT_CLEAR_BNDCFGS | \
> VM_EXIT_PT_CONCEAL_PIP | \
> - VM_EXIT_CLEAR_IA32_RTIT_CTL)
> + VM_EXIT_CLEAR_IA32_RTIT_CTL | \
> + VM_EXIT_LOAD_CET_STATE)
>
> #define KVM_REQUIRED_VMX_PIN_BASED_VM_EXEC_CONTROL \
> (PIN_BASED_EXT_INTR_MASK | \
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 41/51] KVM: x86: Add human friendly formatting for #XM, and #VE
2025-09-19 22:32 ` [PATCH v16 41/51] KVM: x86: Add human friendly formatting for #XM, and #VE Sean Christopherson
@ 2025-09-22 8:29 ` Binbin Wu
0 siblings, 0 replies; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 8:29 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> Add XM_VECTOR and VE_VECTOR pretty-printing for
> trace_kvm_inj_exception().
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/kvm/trace.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 57d79fd31df0..06da19b370c5 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -461,8 +461,8 @@ TRACE_EVENT(kvm_inj_virq,
>
> #define kvm_trace_sym_exc \
> EXS(DE), EXS(DB), EXS(BP), EXS(OF), EXS(BR), EXS(UD), EXS(NM), \
> - EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), \
> - EXS(MF), EXS(AC), EXS(MC)
> + EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), EXS(MF), \
> + EXS(AC), EXS(MC), EXS(XM), EXS(VE)
>
> /*
> * Tracepoint for kvm interrupt injection:
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 42/51] KVM: x86: Define Control Protection Exception (#CP) vector
2025-09-19 22:32 ` [PATCH v16 42/51] KVM: x86: Define Control Protection Exception (#CP) vector Sean Christopherson
@ 2025-09-22 8:29 ` Binbin Wu
0 siblings, 0 replies; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 8:29 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> Add a CP_VECTOR definition for CET's Control Protection Exception (#CP),
> along with human friendly formatting for trace_kvm_inj_exception().
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/include/uapi/asm/kvm.h | 1 +
> arch/x86/kvm/trace.h | 2 +-
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 467116186e71..73e0e88a0a54 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -35,6 +35,7 @@
> #define MC_VECTOR 18
> #define XM_VECTOR 19
> #define VE_VECTOR 20
> +#define CP_VECTOR 21
>
> /* Select x86 specific features in <linux/kvm.h> */
> #define __KVM_HAVE_PIT
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 06da19b370c5..322913dda626 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -462,7 +462,7 @@ TRACE_EVENT(kvm_inj_virq,
> #define kvm_trace_sym_exc \
> EXS(DE), EXS(DB), EXS(BP), EXS(OF), EXS(BR), EXS(UD), EXS(NM), \
> EXS(DF), EXS(TS), EXS(NP), EXS(SS), EXS(GP), EXS(PF), EXS(MF), \
> - EXS(AC), EXS(MC), EXS(XM), EXS(VE)
> + EXS(AC), EXS(MC), EXS(XM), EXS(VE), EXS(CP)
>
> /*
> * Tracepoint for kvm interrupt injection:
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 51/51] KVM: VMX: Make CR4.CET a guest owned bit
2025-09-19 22:32 ` [PATCH v16 51/51] KVM: VMX: Make CR4.CET a guest owned bit Sean Christopherson
@ 2025-09-22 8:34 ` Binbin Wu
0 siblings, 0 replies; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 8:34 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> From: Mathias Krause <minipli@grsecurity.net>
>
> Make CR4.CET a guest-owned bit under VMX by extending
> KVM_POSSIBLE_CR4_GUEST_BITS accordingly.
>
> There's no need to intercept changes to CR4.CET, as it's neither
> included in KVM's MMU role bits, nor does KVM specifically care about
> the actual value of a (nested) guest's CR4.CET value, beside for
> enforcing architectural constraints, i.e. make sure that CR0.WP=1 if
> CR4.CET=1.
>
> Intercepting writes to CR4.CET is particularly bad for grsecurity
> kernels with KERNEXEC or, even worse, KERNSEAL enabled. These features
> heavily make use of read-only kernel objects and use a cpu-local CR0.WP
> toggle to override it, when needed. Under a CET-enabled kernel, this
> also requires toggling CR4.CET, hence the motivation to make it
> guest-owned.
>
> Using the old test from [1] gives the following runtime numbers (perf
> stat -r 5 ssdd 10 50000):
>
> * grsec guest on linux-6.16-rc5 + cet patches:
> 2.4647 +- 0.0706 seconds time elapsed ( +- 2.86% )
>
> * grsec guest on linux-6.16-rc5 + cet patches + CR4.CET guest-owned:
> 1.5648 +- 0.0240 seconds time elapsed ( +- 1.53% )
>
> Not only does not intercepting CR4.CET make the test run ~35% faster,
> it's also more stable with less fluctuation due to fewer VMEXITs.
>
> Therefore, make CR4.CET a guest-owned bit where possible.
>
> This change is VMX-specific, as SVM has no such fine-grained control
> register intercept control.
>
> If KVM's assumptions regarding MMU role handling wrt. a guest's CR4.CET
> value ever change, the BUILD_BUG_ON()s related to KVM_MMU_CR4_ROLE_BITS
> and KVM_POSSIBLE_CR4_GUEST_BITS will catch that early.
>
> Link: https://lore.kernel.org/kvm/20230322013731.102955-1-minipli@grsecurity.net/ [1]
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Mathias Krause <minipli@grsecurity.net>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/kvm/kvm_cache_regs.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
> index 36a8786db291..8ddb01191d6f 100644
> --- a/arch/x86/kvm/kvm_cache_regs.h
> +++ b/arch/x86/kvm/kvm_cache_regs.h
> @@ -7,7 +7,8 @@
> #define KVM_POSSIBLE_CR0_GUEST_BITS (X86_CR0_TS | X86_CR0_WP)
> #define KVM_POSSIBLE_CR4_GUEST_BITS \
> (X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR \
> - | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE)
> + | X86_CR4_OSXMMEXCPT | X86_CR4_PGE | X86_CR4_TSD | X86_CR4_FSGSBASE \
> + | X86_CR4_CET)
>
> #define X86_CR0_PDPTR_BITS (X86_CR0_CD | X86_CR0_NW | X86_CR0_PG)
> #define X86_CR4_TLBFLUSH_BITS (X86_CR4_PGE | X86_CR4_PCIDE | X86_CR4_PAE | X86_CR4_SMEP)
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 30/51] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2
2025-09-19 22:32 ` [PATCH v16 30/51] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2 Sean Christopherson
@ 2025-09-22 8:37 ` Binbin Wu
0 siblings, 0 replies; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 8:37 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Per SDM description(Vol.3D, Appendix A.1):
> "If bit 56 is read as 1, software can use VM entry to deliver a hardware
> exception with or without an error code, regardless of vector"
>
> Modify has_error_code check before inject events to nested guest. Only
> enforce the check when guest is in real mode, the exception is not hard
> exception and the platform doesn't enumerate bit56 in VMX_BASIC, in all
> other case ignore the check to make the logic consistent with SDM.
>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/kvm/vmx/nested.c | 27 ++++++++++++++++++---------
> arch/x86/kvm/vmx/nested.h | 5 +++++
> 2 files changed, 23 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 846c07380eac..b644f4599f70 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1272,9 +1272,10 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
> {
> const u64 feature_bits = VMX_BASIC_DUAL_MONITOR_TREATMENT |
> VMX_BASIC_INOUT |
> - VMX_BASIC_TRUE_CTLS;
> + VMX_BASIC_TRUE_CTLS |
> + VMX_BASIC_NO_HW_ERROR_CODE_CC;
>
> - const u64 reserved_bits = GENMASK_ULL(63, 56) |
> + const u64 reserved_bits = GENMASK_ULL(63, 57) |
> GENMASK_ULL(47, 45) |
> BIT_ULL(31);
>
> @@ -2949,7 +2950,6 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
> u8 vector = intr_info & INTR_INFO_VECTOR_MASK;
> u32 intr_type = intr_info & INTR_INFO_INTR_TYPE_MASK;
> bool has_error_code = intr_info & INTR_INFO_DELIVER_CODE_MASK;
> - bool should_have_error_code;
> bool urg = nested_cpu_has2(vmcs12,
> SECONDARY_EXEC_UNRESTRICTED_GUEST);
> bool prot_mode = !urg || vmcs12->guest_cr0 & X86_CR0_PE;
> @@ -2966,12 +2966,19 @@ static int nested_check_vm_entry_controls(struct kvm_vcpu *vcpu,
> CC(intr_type == INTR_TYPE_OTHER_EVENT && vector != 0))
> return -EINVAL;
>
> - /* VM-entry interruption-info field: deliver error code */
> - should_have_error_code =
> - intr_type == INTR_TYPE_HARD_EXCEPTION && prot_mode &&
> - x86_exception_has_error_code(vector);
> - if (CC(has_error_code != should_have_error_code))
> - return -EINVAL;
> + /*
> + * Cannot deliver error code in real mode or if the interrupt
> + * type is not hardware exception. For other cases, do the
> + * consistency check only if the vCPU doesn't enumerate
> + * VMX_BASIC_NO_HW_ERROR_CODE_CC.
> + */
> + if (!prot_mode || intr_type != INTR_TYPE_HARD_EXCEPTION) {
> + if (CC(has_error_code))
> + return -EINVAL;
> + } else if (!nested_cpu_has_no_hw_errcode_cc(vcpu)) {
> + if (CC(has_error_code != x86_exception_has_error_code(vector)))
> + return -EINVAL;
> + }
>
> /* VM-entry exception error code */
> if (CC(has_error_code &&
> @@ -7217,6 +7224,8 @@ static void nested_vmx_setup_basic(struct nested_vmx_msrs *msrs)
> msrs->basic |= VMX_BASIC_TRUE_CTLS;
> if (cpu_has_vmx_basic_inout())
> msrs->basic |= VMX_BASIC_INOUT;
> + if (cpu_has_vmx_basic_no_hw_errcode_cc())
> + msrs->basic |= VMX_BASIC_NO_HW_ERROR_CODE_CC;
> }
>
> static void nested_vmx_setup_cr_fixed(struct nested_vmx_msrs *msrs)
> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> index 6eedcfc91070..983484d42ebf 100644
> --- a/arch/x86/kvm/vmx/nested.h
> +++ b/arch/x86/kvm/vmx/nested.h
> @@ -309,6 +309,11 @@ static inline bool nested_cr4_valid(struct kvm_vcpu *vcpu, unsigned long val)
> __kvm_is_valid_cr4(vcpu, val);
> }
>
> +static inline bool nested_cpu_has_no_hw_errcode_cc(struct kvm_vcpu *vcpu)
> +{
> + return to_vmx(vcpu)->nested.msrs.basic & VMX_BASIC_NO_HW_ERROR_CODE_CC;
> +}
> +
> /* No difference in the restrictions on guest and host CR4 in VMX operation. */
> #define nested_guest_cr4_valid nested_cr4_valid
> #define nested_host_cr4_valid nested_cr4_valid
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 32/51] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET
2025-09-19 22:32 ` [PATCH v16 32/51] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET Sean Christopherson
@ 2025-09-22 8:47 ` Binbin Wu
0 siblings, 0 replies; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 8:47 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> From: Chao Gao <chao.gao@intel.com>
>
> Add consistency checks for CR4.CET and CR0.WP in guest-state or host-state
> area in the VMCS12. This ensures that configurations with CR4.CET set and
> CR0.WP not set result in VM-entry failure, aligning with architectural
> behavior.
>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> ---
> arch/x86/kvm/vmx/nested.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 11e5d3569933..51c50ce9e011 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3110,6 +3110,9 @@ static int nested_vmx_check_host_state(struct kvm_vcpu *vcpu,
> CC(!kvm_vcpu_is_legal_cr3(vcpu, vmcs12->host_cr3)))
> return -EINVAL;
>
> + if (CC(vmcs12->host_cr4 & X86_CR4_CET && !(vmcs12->host_cr0 & X86_CR0_WP)))
> + return -EINVAL;
> +
> if (CC(is_noncanonical_msr_address(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
> CC(is_noncanonical_msr_address(vmcs12->host_ia32_sysenter_eip, vcpu)))
> return -EINVAL;
> @@ -3224,6 +3227,9 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
> CC(!nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)))
> return -EINVAL;
>
> + if (CC(vmcs12->guest_cr4 & X86_CR4_CET && !(vmcs12->guest_cr0 & X86_CR0_WP)))
> + return -EINVAL;
> +
> if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
> (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
> CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 33/51] KVM: nVMX: Add consistency checks for CET states
2025-09-19 22:32 ` [PATCH v16 33/51] KVM: nVMX: Add consistency checks for CET states Sean Christopherson
@ 2025-09-22 9:23 ` Binbin Wu
2025-09-22 16:35 ` Sean Christopherson
0 siblings, 1 reply; 114+ messages in thread
From: Binbin Wu @ 2025-09-22 9:23 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> From: Chao Gao <chao.gao@intel.com>
>
> Introduce consistency checks for CET states during nested VM-entry.
>
> A VMCS contains both guest and host CET states, each comprising the
> IA32_S_CET MSR, SSP, and IA32_INTERRUPT_SSP_TABLE_ADDR MSR. Various
> checks are applied to CET states during VM-entry as documented in SDM
> Vol3 Chapter "VM ENTRIES". Implement all these checks during nested
> VM-entry to emulate the architectural behavior.
>
> In summary, there are three kinds of checks on guest/host CET states
> during VM-entry:
>
> A. Checks applied to both guest states and host states:
>
> * The IA32_S_CET field must not set any reserved bits; bits 10 (SUPPRESS)
> and 11 (TRACKER) cannot both be set.
> * SSP should not have bits 1:0 set.
> * The IA32_INTERRUPT_SSP_TABLE_ADDR field must be canonical.
>
> B. Checks applied to host states only
>
> * IA32_S_CET MSR and SSP must be canonical if the CPU enters 64-bit mode
> after VM-exit. Otherwise, IA32_S_CET and SSP must have their higher 32
> bits cleared.
>
> C. Checks applied to guest states only:
>
> * IA32_S_CET MSR and SSP are not required to be canonical (i.e., 63:N-1
> are identical, where N is the CPU's maximum linear-address width). But,
> bits 63:N of SSP must be identical.
>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
One nit below.
> ---
> arch/x86/kvm/vmx/nested.c | 47 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 47 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 51c50ce9e011..024bfb4d3a72 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3100,6 +3100,17 @@ static bool is_l1_noncanonical_address_on_vmexit(u64 la, struct vmcs12 *vmcs12)
> return !__is_canonical_address(la, l1_address_bits_on_exit);
> }
>
> +static bool is_valid_cet_state(struct kvm_vcpu *vcpu, u64 s_cet, u64 ssp, u64 ssp_tbl)
> +{
> + if (!kvm_is_valid_u_s_cet(vcpu, s_cet) || !IS_ALIGNED(ssp, 4))
> + return false;
> +
> + if (is_noncanonical_msr_address(ssp_tbl, vcpu))
> + return false;
> +
> + return true;
> +}
Nit:
Is the following simpler?
index a8a421a8e766..17ba37c2bbfc 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3102,13 +3102,8 @@ static bool is_l1_noncanonical_address_on_vmexit(u64 la, struct vmcs12 *vmcs12)
static bool is_valid_cet_state(struct kvm_vcpu *vcpu, u64 s_cet, u64 ssp, u64 ssp_tbl)
{
- if (!kvm_is_valid_u_s_cet(vcpu, s_cet) || !IS_ALIGNED(ssp, 4))
- return false;
-
- if (is_noncanonical_msr_address(ssp_tbl, vcpu))
- return false;
-
- return true;
+ return (kvm_is_valid_u_s_cet(vcpu, s_cet) && IS_ALIGNED(ssp, 4) &&
+ !is_noncanonical_msr_address(ssp_tbl, vcpu));
}
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v16 18/51] KVM: x86: Don't emulate instructions affected by CET features
2025-09-19 22:32 ` [PATCH v16 18/51] KVM: x86: Don't emulate instructions affected by CET features Sean Christopherson
2025-09-22 5:39 ` Binbin Wu
@ 2025-09-22 10:27 ` Chao Gao
2025-09-22 20:04 ` Sean Christopherson
1 sibling, 1 reply; 114+ messages in thread
From: Chao Gao @ 2025-09-22 10:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Fri, Sep 19, 2025 at 03:32:25PM -0700, Sean Christopherson wrote:
>Don't emulate branch instructions, e.g. CALL/RET/JMP etc., that are
>affected by Shadow Stacks and/or Indirect Branch Tracking when said
>features are enabled in the guest, as fully emulating CET would require
>significant complexity for no practical benefit (KVM shouldn't need to
>emulate branch instructions on modern hosts). Simply doing nothing isn't
>an option as that would allow a malicious entity to subvert CET
>protections via the emulator.
>
>To detect instructions that are subject to IBT or affect IBT state, use
>the existing IsBranch flag along with the source operand type to detect
>indirect branches, and the existing NearBranch flag to detect far branches
>(which can affect IBT state even if the branch itself is direct).
>
>For Shadow Stacks, explicitly track instructions that directly affect the
>current SSP, as KVM's emulator doesn't have existing flags that can be
>used to precisely detect such instructions. Alternatively, the em_xxx()
>helpers could directly check for ShadowStack interactions, but using a
>dedicated flag is arguably easier to audit, and allows for handling both
>IBT and SHSTK in one fell swoop.
>
>Note! On far transfers, do NOT consult the current privilege level and
>instead treat SHSTK/IBT as being enabled if they're enabled for User *or*
>Supervisor mode. On inter-privilege level far transfers, SHSTK and IBT
>can be in play for the target privilege level, i.e. checking the current
>privilege could get a false negative, and KVM doesn't know the target
>privilege level until emulation gets under way.
>
>Note #2, FAR JMP from 64-bit mode to compatibility mode interacts with
>the current SSP, but only to ensure SSP[63:32] == 0. Don't tag FAR JMP
>as SHSTK, which would be rather confusing and would result in FAR JMP
>being rejected unnecessarily the vast majority of the time (ignoring that
>it's unlikely to ever be emulated). A future commit will add the #GP(0)
>check for the specific FAR JMP scenario.
>
>Note #3, task switches also modify SSP and so need to be rejected. That
>too will be addressed in a future commit.
>
>Suggested-by: Chao Gao <chao.gao@intel.com>
>Originally-by: Yang Weijiang <weijiang.yang@intel.com>
>Cc: Mathias Krause <minipli@grsecurity.net>
>Cc: John Allen <john.allen@amd.com>
>Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
<snip>
>+static bool is_ibt_instruction(u64 flags)
>+{
>+ if (!(flags & IsBranch))
>+ return false;
>+
>+ /*
>+ * Far transfers can affect IBT state even if the branch itself is
>+ * direct, e.g. when changing privilege levels and loading a conforming
>+ * code segment. For simplicity, treat all far branches as affecting
>+ * IBT. False positives are acceptable (emulating far branches on an
>+ * IBT-capable CPU won't happen in practice), while false negatives
>+ * could impact guest security.
>+ *
>+ * Note, this also handles SYCALL and SYSENTER.
>+ */
>+ if (!(flags & NearBranch))
>+ return true;
>+
>+ switch (flags & (OpMask << SrcShift)) {
nit: maybe use SrcMask here.
#define SrcMask (OpMask << SrcShift)
>+ case SrcReg:
>+ case SrcMem:
>+ case SrcMem16:
>+ case SrcMem32:
>+ return true;
>+ case SrcMemFAddr:
>+ case SrcImmFAddr:
>+ /* Far branches should be handled above. */
>+ WARN_ON_ONCE(1);
>+ return true;
>+ case SrcNone:
>+ case SrcImm:
>+ case SrcImmByte:
>+ /*
>+ * Note, ImmU16 is used only for the stack adjustment operand on ENTER
>+ * and RET instructions. ENTER isn't a branch and RET FAR is handled
>+ * by the NearBranch check above. RET itself isn't an indirect branch.
>+ */
RET FAR isn't affected by IBT, right? So it is a false positive in the above
NearBranch check. I am not asking you to fix it - just want to ensure it is
intended.
>+ case SrcImmU16:
>+ return false;
>+ default:
>+ WARN_ONCE(1, "Unexpected Src operand '%llx' on branch",
>+ (flags & (OpMask << SrcShift)));
Ditto. use SrcMask here.
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 19/51] KVM: x86: Don't emulate task switches when IBT or SHSTK is enabled
2025-09-19 22:32 ` [PATCH v16 19/51] KVM: x86: Don't emulate task switches when IBT or SHSTK is enabled Sean Christopherson
2025-09-22 6:41 ` Binbin Wu
@ 2025-09-22 11:27 ` Chao Gao
1 sibling, 0 replies; 114+ messages in thread
From: Chao Gao @ 2025-09-22 11:27 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
>@@ -12178,6 +12178,25 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
> struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
> int ret;
>
>+ if (kvm_is_cr4_bit_set(vcpu, X86_CR4_CET)) {
>+ u64 u_cet, s_cet;
>+
>+ /*
>+ * Check both User and Supervisor on task switches as inter-
>+ * privilege level task switches are impacted by CET at both
>+ * the current privilege level and the new privilege level, and
>+ * that information is not known at this time. The expectation
>+ * is that the guest won't require emulation of task switches
>+ * while using IBT or Shadow Stacks.
>+ */
>+ if (__kvm_emulate_msr_read(vcpu, MSR_IA32_U_CET, &u_cet) ||
>+ __kvm_emulate_msr_read(vcpu, MSR_IA32_S_CET, &s_cet))
>+ return EMULATION_FAILED;
is it ok to return EMULATION_FAILED (-1) here?
It looks like this error code will be propagated to userspace and be
interpreted as -EPERM.
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 22/51] KVM: x86/mmu: Pretty print PK, SS, and SGX flags in MMU tracepoints
2025-09-22 7:18 ` Binbin Wu
@ 2025-09-22 16:18 ` Sean Christopherson
0 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-22 16:18 UTC (permalink / raw)
To: Binbin Wu
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Mon, Sep 22, 2025, Binbin Wu wrote:
>
>
> On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> > Add PK (Protection Keys), SS (Shadow Stacks), and SGX (Software Guard
> > Extensions) to the set of #PF error flags handled via
> > kvm_mmu_trace_pferr_flags. While KVM doesn't expect PK or SS #PFs
> Also SGX.
Huh. I deliberately omitted SGX from this particular statement, as KVM supports
SGX virtualization with shadow paging. I.e. KVM "expects" PFERR_SGX in the sense
that an EPCM violation on SGX2 hardware will show up in KVM.
Typing that out made me realize that, unless I'm forgetting/missing code, KVM
doesn't actually do the right thing with respect to intercepted #PFs with PFERR_SGX.
On SGX2 hardware, an EPCM permissions violation will trigger a #PF(SGX). KVM isn't
aware that such exceptions effectively have nothing to do with software-visibile
page tables. And so I'm pretty sure an EPCM violation on SGX2 hardware would put
the vCPU into an infinite loop due to KVM not realizing the #PF (ugh, or #GP if
the guest CPU model is only SGX1) should be injected into the guest, (KVM will
think the fault is spurious).
To fix that, we'd need something like the below (completely untested). But for
this patch, the changelog is "correct", i.e. observing SGX #PFs shouldn't be
impossible.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 08845c1d7a62..99cc790615fd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5175,12 +5175,52 @@ static bool is_xfd_nm_fault(struct kvm_vcpu *vcpu)
!kvm_is_cr0_bit_set(vcpu, X86_CR0_TS);
}
+static int vmx_handle_page_fault(struct kvm_vcpu *vcpu, u32 error_code)
+{
+ unsigned long cr2 = vmx_get_exit_qual(vcpu);
+
+ if (vcpu->arch.apf.host_apf_flags)
+ goto handle_pf;
+
+ /* When using EPT, KVM intercepts #PF only to detect illegal GPAs. */
+ WARN_ON_ONCE(enable_ept && !allow_smaller_maxphyaddr);
+
+ /*
+ * On SGX2 hardware, EPCM violations are delivered as #PF with the SGX
+ * flag set in the error code (SGX1 harware generates #GP(0)). EPCM
+ * violations have nothing to do with shadow paging and can never be
+ * resolved by KVM; always reflect them into the guest.
+ */
+ if (error_code & PFERR_SGX_MASK) {
+ WARN_ON_ONCE(!IS_ENABLED_(CONFIG_X86_SGX_KVM) ||
+ !cpu_feature_enabled(X86_FEATURE_SGX2));
+ if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX2))
+ kvm_fixup_and_inject_pf_error(vcpu, cr2, error_code);
+ else
+ kvm_inject_gp(vcpu, 0);
+ return 1;
+ }
+
+ /*
+ * If EPT is enabled, fixup and inject the #PF. KVM intercepts #PFs
+ * only to set PFERR_RSVD as appropriate (hardware won't set RSVD due
+ * to the GPA being legal with respect to host.MAXPHYADDR).
+ */
+ if (enable_ept) {
+ kvm_fixup_and_inject_pf_error(vcpu, cr2, error_code);
+ return 1;
+ }
+
+handle_pf:
+ return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
+}
+
static int handle_exception_nmi(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct kvm_run *kvm_run = vcpu->run;
u32 intr_info, ex_no, error_code;
- unsigned long cr2, dr6;
+ unsigned long dr6;
u32 vect_info;
vect_info = vmx->idt_vectoring_info;
@@ -5255,19 +5295,8 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
return 0;
}
- if (is_page_fault(intr_info)) {
- cr2 = vmx_get_exit_qual(vcpu);
- if (enable_ept && !vcpu->arch.apf.host_apf_flags) {
- /*
- * EPT will cause page fault only if we need to
- * detect illegal GPAs.
- */
- WARN_ON_ONCE(!allow_smaller_maxphyaddr);
- kvm_fixup_and_inject_pf_error(vcpu, cr2, error_code);
- return 1;
- } else
- return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
- }
+ if (is_page_fault(intr_info))
+ return vmx_handle_page_fault(vcpu, error_code);
ex_no = intr_info & INTR_INFO_VECTOR_MASK;
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v16 33/51] KVM: nVMX: Add consistency checks for CET states
2025-09-22 9:23 ` Binbin Wu
@ 2025-09-22 16:35 ` Sean Christopherson
0 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-22 16:35 UTC (permalink / raw)
To: Binbin Wu
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Mon, Sep 22, 2025, Binbin Wu wrote:
> On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> Is the following simpler?
Yeah. I was going to say that separating checks in cases like this is sometimes
"better" when each statement deals with different state. But in this case, SSP
is bundled with S_CET, but not SSP_TBL, and so the whole thing is rather odd.
> index a8a421a8e766..17ba37c2bbfc 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3102,13 +3102,8 @@ static bool is_l1_noncanonical_address_on_vmexit(u64 la, struct vmcs12 *vmcs12)
>
> static bool is_valid_cet_state(struct kvm_vcpu *vcpu, u64 s_cet, u64 ssp, u64 ssp_tbl)
> {
> - if (!kvm_is_valid_u_s_cet(vcpu, s_cet) || !IS_ALIGNED(ssp, 4))
> - return false;
> -
> - if (is_noncanonical_msr_address(ssp_tbl, vcpu))
> - return false;
> -
> - return true;
> + return (kvm_is_valid_u_s_cet(vcpu, s_cet) && IS_ALIGNED(ssp, 4) &&
> + !is_noncanonical_msr_address(ssp_tbl, vcpu));
Parantheses are unnecessary.
But looking at this again, is_valid_cet_state() is a misleading name. In isolation,
it would be very easy to assume the helper checks _all_ CET state, but that's not
the case. And the other flaw is that the CC() tracepoint won't identify exactly
which check failed.
Completely untested, but assuming I didn't fat-finger something, I'll fixup to
this:
static int nested_vmx_check_cet_state_common(struct kvm_vcpu *vcpu, u64 s_cet,
u64 ssp, u64 ssp_tbl)
{
if (CC(!kvm_is_valid_u_s_cet(vcpu, s_cet)) || CC(!IS_ALIGNED(ssp, 4)) ||
CC(is_noncanonical_msr_address(ssp_tbl, vcpu)))
return -EINVAL;
return 0;
}
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 09/51] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
2025-09-22 2:10 ` Binbin Wu
@ 2025-09-22 16:41 ` Sean Christopherson
0 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-22 16:41 UTC (permalink / raw)
To: Binbin Wu
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Mon, Sep 22, 2025, Binbin Wu wrote:
>
>
> On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> [...]
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 3e66d8c5000a..ae402463f991 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -136,6 +136,9 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
> > static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
> > static DEFINE_MUTEX(vendor_module_lock);
> > +static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
> > +static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
> > +
> > struct kvm_x86_ops kvm_x86_ops __read_mostly;
> > #define KVM_X86_OP(func) \
> > @@ -3801,6 +3804,67 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
> > mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa));
> > }
> > +/*
> > + * Returns true if the MSR in question is managed via XSTATE, i.e. is context
> > + * switched with the rest of guest FPU state. Note! S_CET is _not_ context
> > + * switched via XSTATE even though it _is_ saved/restored via XSAVES/XRSTORS.
> > + * Because S_CET is loaded on VM-Enter and VM-Exit via dedicated VMCS fields,
> > + * the value saved/restored via XSTATE is always the host's value. That detail
> > + * is _extremely_ important, as the guest's S_CET must _never_ be resident in
> > + * hardware while executing in the host. Loading guest values for U_CET and
> > + * PL[0-3]_SSP while executing in the kernel is safe, as U_CET is specific to
> > + * userspace, and PL[0-3]_SSP are only consumed when transitioning to lower
> > + * privilegel levels, i.e. are effectively only consumed by userspace as well.
>
> s/privilegel/privilege[...]
Fixed up, thanks!
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 18/51] KVM: x86: Don't emulate instructions affected by CET features
2025-09-22 5:39 ` Binbin Wu
@ 2025-09-22 16:47 ` Sean Christopherson
0 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-22 16:47 UTC (permalink / raw)
To: Binbin Wu
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Mon, Sep 22, 2025, Binbin Wu wrote:
> > +static bool is_ibt_instruction(u64 flags)
> > +{
> > + if (!(flags & IsBranch))
> > + return false;
> > +
> > + /*
> > + * Far transfers can affect IBT state even if the branch itself is
> > + * direct, e.g. when changing privilege levels and loading a conforming
> > + * code segment. For simplicity, treat all far branches as affecting
> > + * IBT. False positives are acceptable (emulating far branches on an
> > + * IBT-capable CPU won't happen in practice), while false negatives
> > + * could impact guest security.
> > + *
> > + * Note, this also handles SYCALL and SYSENTER.
>
> SYCALL -> SYSCALL
Fixed.
> > + */
> > + if (!(flags & NearBranch))
> > + return true;
> > +
> > + switch (flags & (OpMask << SrcShift)) {
> > + case SrcReg:
> > + case SrcMem:
> > + case SrcMem16:
> > + case SrcMem32:
> > + return true;
> > + case SrcMemFAddr:
> > + case SrcImmFAddr:
> > + /* Far branches should be handled above. */
> > + WARN_ON_ONCE(1);
> > + return true;
> > + case SrcNone:
> > + case SrcImm:
> > + case SrcImmByte:
> > + /*
> > + * Note, ImmU16 is used only for the stack adjustment operand on ENTER
> > + * and RET instructions. ENTER isn't a branch and RET FAR is handled
> > + * by the NearBranch check above. RET itself isn't an indirect branch.
> > + */
> > + case SrcImmU16:
> > + return false;
> > + default:
> > + WARN_ONCE(1, "Unexpected Src operand '%llx' on branch",
> > + (flags & (OpMask << SrcShift)));
> > + return false;
>
> Is it safer to reject the emulation if it has unexpected src operand?
Not really? Maybe? Honestly, we've failed miserably if this escapes initial
development and testing, to the point where I don't think there's a "good"
answer as to whether KVM should treat the instruction as affecting IBT. I think
I'd prefer to let the guest limp along and hope for the best?
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 19/51] KVM: x86: Don't emulate task switches when IBT or SHSTK is enabled
2025-09-22 6:41 ` Binbin Wu
@ 2025-09-22 17:23 ` Sean Christopherson
2025-09-23 14:16 ` Xiaoyao Li
0 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-22 17:23 UTC (permalink / raw)
To: Binbin Wu
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Mon, Sep 22, 2025, Binbin Wu wrote:
>
>
> On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> > Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if the guest triggers
> > task switch emulation with Indirect Branch Tracking or Shadow Stacks
> > enabled,
>
> The code just does it when shadow stack is enabled.
Doh. Fixed that and the EMULATION_FAILED typo Chao pointed out:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8b31dfcb1de9..06a88a2b08d7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12194,9 +12194,9 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
*/
if (__kvm_emulate_msr_read(vcpu, MSR_IA32_U_CET, &u_cet) ||
__kvm_emulate_msr_read(vcpu, MSR_IA32_S_CET, &s_cet))
- return EMULATION_FAILED;
+ goto unhandled_task_switch;
- if ((u_cet | s_cet) & CET_SHSTK_EN)
+ if ((u_cet | s_cet) & (CET_ENDBR_EN | CET_SHSTK_EN))
goto unhandled_task_switch;
}
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v16 27/51] KVM: x86: Disable support for IBT and SHSTK if allow_smaller_maxphyaddr is true
2025-09-19 22:32 ` [PATCH v16 27/51] KVM: x86: Disable support for IBT and SHSTK if allow_smaller_maxphyaddr is true Sean Christopherson
2025-09-22 8:00 ` Binbin Wu
@ 2025-09-22 18:40 ` Sean Christopherson
2025-09-23 14:44 ` Xiaoyao Li
2 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-22 18:40 UTC (permalink / raw)
To: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li,
Maxim Levitsky, Zhang Yi Z, Xin Li
On Fri, Sep 19, 2025, Sean Christopherson wrote:
> Make IBT and SHSTK virtualization mutually exclusive with "officially"
> supporting setups with guest.MAXPHYADDR < host.MAXPHYADDR, i.e. if the
> allow_smaller_maxphyaddr module param is set. Running a guest with a
> smaller MAXPHYADDR requires intercepting #PF, and can also trigger
> emulation of arbitrary instructions. Intercepting and reacting to #PFs
> doesn't play nice with SHSTK, as KVM's MMU hasn't been taught to handle
> Shadow Stack accesses, and emulating arbitrary instructions doesn't play
> nice with IBT or SHSTK, as KVM's emulator doesn't handle the various side
> effects, e.g. doesn't enforce end-branch markers or model Shadow Stack
> updates.
>
> Note, hiding IBT and SHSTK based solely on allow_smaller_maxphyaddr is
> overkill, as allow_smaller_maxphyaddr is only problematic if the guest is
> actually configured to have a smaller MAXPHYADDR. However, KVM's ABI
> doesn't provide a way to express that IBT and SHSTK may break if enabled
> in conjunction with guest.MAXPHYADDR < host.MAXPHYADDR. I.e. the
> alternative is to do nothing in KVM and instead update documentation and
> hope KVM users are thorough readers. Go with the conservative-but-correct
> approach; worst case scenario, this restriction can be dropped if there's
> a strong use case for enabling CET on hosts with allow_smaller_maxphyaddr.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/cpuid.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 499c86bd457e..b5c4cb13630c 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -963,6 +963,16 @@ void kvm_set_cpu_caps(void)
> if (!tdp_enabled)
> kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
>
> + /*
> + * Disable support for IBT and SHSTK if KVM is configured to emulate
> + * accesses to reserved GPAs, as KVM's emulator doesn't support IBT or
> + * SHSTK, nor does KVM handle Shadow Stack #PFs (see above).
> + */
> + if (allow_smaller_maxphyaddr) {
> + kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
> + kvm_cpu_cap_clear(X86_FEATURE_IBT);
> + }
Ugh, testing fail. F(IBT) is initialized in CPUID_7_EDX, clearing IBT here has
no effect.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b861a88083e1..d290dbc96831 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -964,16 +964,6 @@ void kvm_set_cpu_caps(void)
if (!tdp_enabled)
kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
- /*
- * Disable support for IBT and SHSTK if KVM is configured to emulate
- * accesses to reserved GPAs, as KVM's emulator doesn't support IBT or
- * SHSTK, nor does KVM handle Shadow Stack #PFs (see above).
- */
- if (allow_smaller_maxphyaddr) {
- kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
- kvm_cpu_cap_clear(X86_FEATURE_IBT);
- }
-
kvm_cpu_cap_init(CPUID_7_EDX,
F(AVX512_4VNNIW),
F(AVX512_4FMAPS),
@@ -994,6 +984,16 @@ void kvm_set_cpu_caps(void)
F(IBT),
);
+ /*
+ * Disable support for IBT and SHSTK if KVM is configured to emulate
+ * accesses to reserved GPAs, as KVM's emulator doesn't support IBT or
+ * SHSTK, nor does KVM handle Shadow Stack #PFs (see above).
+ */
+ if (allow_smaller_maxphyaddr) {
+ kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+ kvm_cpu_cap_clear(X86_FEATURE_IBT);
+ }
+
if (boot_cpu_has(X86_FEATURE_AMD_IBPB_RET) &&
boot_cpu_has(X86_FEATURE_AMD_IBPB) &&
boot_cpu_has(X86_FEATURE_AMD_IBRS))
> +
> kvm_cpu_cap_init(CPUID_7_EDX,
> F(AVX512_4VNNIW),
> F(AVX512_4FMAPS),
> --
> 2.51.0.470.ga7dc726c21-goog
>
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v16 18/51] KVM: x86: Don't emulate instructions affected by CET features
2025-09-22 10:27 ` Chao Gao
@ 2025-09-22 20:04 ` Sean Christopherson
2025-09-23 14:12 ` Xiaoyao Li
0 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-22 20:04 UTC (permalink / raw)
To: Chao Gao
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Mon, Sep 22, 2025, Chao Gao wrote:
> >+static bool is_ibt_instruction(u64 flags)
> >+{
> >+ if (!(flags & IsBranch))
> >+ return false;
> >+
> >+ /*
> >+ * Far transfers can affect IBT state even if the branch itself is
> >+ * direct, e.g. when changing privilege levels and loading a conforming
> >+ * code segment. For simplicity, treat all far branches as affecting
> >+ * IBT. False positives are acceptable (emulating far branches on an
> >+ * IBT-capable CPU won't happen in practice), while false negatives
> >+ * could impact guest security.
> >+ *
> >+ * Note, this also handles SYCALL and SYSENTER.
> >+ */
> >+ if (!(flags & NearBranch))
> >+ return true;
> >+
> >+ switch (flags & (OpMask << SrcShift)) {
>
> nit: maybe use SrcMask here.
>
> #define SrcMask (OpMask << SrcShift)
Fixed. No idea how I missed that macro.
> >+ case SrcReg:
> >+ case SrcMem:
> >+ case SrcMem16:
> >+ case SrcMem32:
> >+ return true;
> >+ case SrcMemFAddr:
> >+ case SrcImmFAddr:
> >+ /* Far branches should be handled above. */
> >+ WARN_ON_ONCE(1);
> >+ return true;
> >+ case SrcNone:
> >+ case SrcImm:
> >+ case SrcImmByte:
> >+ /*
> >+ * Note, ImmU16 is used only for the stack adjustment operand on ENTER
> >+ * and RET instructions. ENTER isn't a branch and RET FAR is handled
> >+ * by the NearBranch check above. RET itself isn't an indirect branch.
> >+ */
>
> RET FAR isn't affected by IBT, right?
Correct, AFAICT RET FAR doesn't have any interactions with IBT.
> So it is a false positive in the above NearBranch check. I am not asking you
> to fix it - just want to ensure it is intended.
Intended, but wrong. Specifically, this isn't true for FAR RET or IRET:
Far transfers can affect IBT state even if the branch itself is direct
(IRET #GPs on return to vm86, but KVM doesn't emulate IRET if CR0.PE=1, so that's
a moot point)
While it's tempting to sweep this under the rug, it's easy enough to handle with
a short allow-list. I can't imagine it'll ever matter, e.g. the odds of a guest
enabling IBT _and_ doing a FAR RET without a previous FAR CALL _and_ triggering
emulation on the FAR RET would be... impressive.
This is what I have applied. It passes both negative and positive testcases for
FAR RET and IRET (I didn't try to encode SYSEXIT; though that's be a "fun" way to
implement usermode support in KUT :-D)
--
From: Sean Christopherson <seanjc@google.com>
Date: Fri, 19 Sep 2025 15:32:25 -0700
Subject: [PATCH] KVM: x86: Don't emulate instructions affected by CET features
Don't emulate branch instructions, e.g. CALL/RET/JMP etc., that are
affected by Shadow Stacks and/or Indirect Branch Tracking when said
features are enabled in the guest, as fully emulating CET would require
significant complexity for no practical benefit (KVM shouldn't need to
emulate branch instructions on modern hosts). Simply doing nothing isn't
an option as that would allow a malicious entity to subvert CET
protections via the emulator.
To detect instructions that are subject to IBT or affect IBT state, use
the existing IsBranch flag along with the source operand type to detect
indirect branches, and the existing NearBranch flag to detect far JMPs
and CALLs, all of which are effectively indirect. Explicitly check for
emulation of IRET, FAR RET (IMM), and SYSEXIT (the ret-like far branches)
instead of adding another flag, e.g. IsRet, as it's unlikely the emulator
will ever need to check for return-like instructions outside of this one
specific flow. Use an allow-list instead of a deny-list because (a) it's
a shorter list and (b) so that a missed entry gets a false positive, not a
false negative (i.e. reject emulation instead of clobbering CET state).
For Shadow Stacks, explicitly track instructions that directly affect the
current SSP, as KVM's emulator doesn't have existing flags that can be
used to precisely detect such instructions. Alternatively, the em_xxx()
helpers could directly check for ShadowStack interactions, but using a
dedicated flag is arguably easier to audit, and allows for handling both
IBT and SHSTK in one fell swoop.
Note! On far transfers, do NOT consult the current privilege level and
instead treat SHSTK/IBT as being enabled if they're enabled for User *or*
Supervisor mode. On inter-privilege level far transfers, SHSTK and IBT
can be in play for the target privilege level, i.e. checking the current
privilege could get a false negative, and KVM doesn't know the target
privilege level until emulation gets under way.
Note #2, FAR JMP from 64-bit mode to compatibility mode interacts with
the current SSP, but only to ensure SSP[63:32] == 0. Don't tag FAR JMP
as SHSTK, which would be rather confusing and would result in FAR JMP
being rejected unnecessarily the vast majority of the time (ignoring that
it's unlikely to ever be emulated). A future commit will add the #GP(0)
check for the specific FAR JMP scenario.
Note #3, task switches also modify SSP and so need to be rejected. That
too will be addressed in a future commit.
Suggested-by: Chao Gao <chao.gao@intel.com>
Originally-by: Yang Weijiang <weijiang.yang@intel.com>
Cc: Mathias Krause <minipli@grsecurity.net>
Cc: John Allen <john.allen@amd.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Link: https://lore.kernel.org/r/20250919223258.1604852-19-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/emulate.c | 117 ++++++++++++++++++++++++++++++++++++-----
1 file changed, 103 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 23929151a5b8..a7683dc18405 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -178,6 +178,7 @@
#define IncSP ((u64)1 << 54) /* SP is incremented before ModRM calc */
#define TwoMemOp ((u64)1 << 55) /* Instruction has two memory operand */
#define IsBranch ((u64)1 << 56) /* Instruction is considered a branch. */
+#define ShadowStack ((u64)1 << 57) /* Instruction affects Shadow Stacks. */
#define DstXacc (DstAccLo | SrcAccHi | SrcWrite)
@@ -4068,9 +4069,9 @@ static const struct opcode group4[] = {
static const struct opcode group5[] = {
F(DstMem | SrcNone | Lock, em_inc),
F(DstMem | SrcNone | Lock, em_dec),
- I(SrcMem | NearBranch | IsBranch, em_call_near_abs),
- I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far),
- I(SrcMem | NearBranch | IsBranch, em_jmp_abs),
+ I(SrcMem | NearBranch | IsBranch | ShadowStack, em_call_near_abs),
+ I(SrcMemFAddr | ImplicitOps | IsBranch | ShadowStack, em_call_far),
+ I(SrcMem | NearBranch | IsBranch, em_jmp_abs),
I(SrcMemFAddr | ImplicitOps | IsBranch, em_jmp_far),
I(SrcMem | Stack | TwoMemOp, em_push), D(Undefined),
};
@@ -4304,7 +4305,7 @@ static const struct opcode opcode_table[256] = {
DI(SrcAcc | DstReg, pause), X7(D(SrcAcc | DstReg)),
/* 0x98 - 0x9F */
D(DstAcc | SrcNone), I(ImplicitOps | SrcAcc, em_cwd),
- I(SrcImmFAddr | No64 | IsBranch, em_call_far), N,
+ I(SrcImmFAddr | No64 | IsBranch | ShadowStack, em_call_far), N,
II(ImplicitOps | Stack, em_pushf, pushf),
II(ImplicitOps | Stack, em_popf, popf),
I(ImplicitOps, em_sahf), I(ImplicitOps, em_lahf),
@@ -4324,19 +4325,19 @@ static const struct opcode opcode_table[256] = {
X8(I(DstReg | SrcImm64 | Mov, em_mov)),
/* 0xC0 - 0xC7 */
G(ByteOp | Src2ImmByte, group2), G(Src2ImmByte, group2),
- I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch, em_ret_near_imm),
- I(ImplicitOps | NearBranch | IsBranch, em_ret),
+ I(ImplicitOps | NearBranch | SrcImmU16 | IsBranch | ShadowStack, em_ret_near_imm),
+ I(ImplicitOps | NearBranch | IsBranch | ShadowStack, em_ret),
I(DstReg | SrcMemFAddr | ModRM | No64 | Src2ES, em_lseg),
I(DstReg | SrcMemFAddr | ModRM | No64 | Src2DS, em_lseg),
G(ByteOp, group11), G(0, group11),
/* 0xC8 - 0xCF */
I(Stack | SrcImmU16 | Src2ImmByte, em_enter),
I(Stack, em_leave),
- I(ImplicitOps | SrcImmU16 | IsBranch, em_ret_far_imm),
- I(ImplicitOps | IsBranch, em_ret_far),
- D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch, intn),
+ I(ImplicitOps | SrcImmU16 | IsBranch | ShadowStack, em_ret_far_imm),
+ I(ImplicitOps | IsBranch | ShadowStack, em_ret_far),
+ D(ImplicitOps | IsBranch), DI(SrcImmByte | IsBranch | ShadowStack, intn),
D(ImplicitOps | No64 | IsBranch),
- II(ImplicitOps | IsBranch, em_iret, iret),
+ II(ImplicitOps | IsBranch | ShadowStack, em_iret, iret),
/* 0xD0 - 0xD7 */
G(Src2One | ByteOp, group2), G(Src2One, group2),
G(Src2CL | ByteOp, group2), G(Src2CL, group2),
@@ -4352,7 +4353,7 @@ static const struct opcode opcode_table[256] = {
I2bvIP(SrcImmUByte | DstAcc, em_in, in, check_perm_in),
I2bvIP(SrcAcc | DstImmUByte, em_out, out, check_perm_out),
/* 0xE8 - 0xEF */
- I(SrcImm | NearBranch | IsBranch, em_call),
+ I(SrcImm | NearBranch | IsBranch | ShadowStack, em_call),
D(SrcImm | ImplicitOps | NearBranch | IsBranch),
I(SrcImmFAddr | No64 | IsBranch, em_jmp_far),
D(SrcImmByte | ImplicitOps | NearBranch | IsBranch),
@@ -4371,7 +4372,7 @@ static const struct opcode opcode_table[256] = {
static const struct opcode twobyte_table[256] = {
/* 0x00 - 0x0F */
G(0, group6), GD(0, &group7), N, N,
- N, I(ImplicitOps | EmulateOnUD | IsBranch, em_syscall),
+ N, I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack, em_syscall),
II(ImplicitOps | Priv, em_clts, clts), N,
DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N,
N, D(ImplicitOps | ModRM | SrcMem | NoAccess), N, N,
@@ -4402,8 +4403,8 @@ static const struct opcode twobyte_table[256] = {
IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc),
II(ImplicitOps | Priv, em_rdmsr, rdmsr),
IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc),
- I(ImplicitOps | EmulateOnUD | IsBranch, em_sysenter),
- I(ImplicitOps | Priv | EmulateOnUD | IsBranch, em_sysexit),
+ I(ImplicitOps | EmulateOnUD | IsBranch | ShadowStack, em_sysenter),
+ I(ImplicitOps | Priv | EmulateOnUD | IsBranch | ShadowStack, em_sysexit),
N, N,
N, N, N, N, N, N, N, N,
/* 0x40 - 0x4F */
@@ -4514,6 +4515,60 @@ static const struct opcode opcode_map_0f_38[256] = {
#undef I2bvIP
#undef I6ALU
+static bool is_shstk_instruction(struct x86_emulate_ctxt *ctxt)
+{
+ return ctxt->d & ShadowStack;
+}
+
+static bool is_ibt_instruction(struct x86_emulate_ctxt *ctxt)
+{
+ u64 flags = ctxt->d;
+
+ if (!(flags & IsBranch))
+ return false;
+
+ /*
+ * All far JMPs and CALLs (including SYSCALL, SYSENTER, and INTn) are
+ * indirect and thus affect IBT state. All far RETs (including SYSEXIT
+ * and IRET) are protected via Shadow Stacks and thus don't affect IBT
+ * state. IRET #GPs when returning to virtual-8086 and IBT or SHSTK is
+ * enabled, but that should be handled by IRET emulation (in the very
+ * unlikely scenario that KVM adds support for fully emulating IRET).
+ */
+ if (!(flags & NearBranch))
+ return ctxt->execute != em_iret &&
+ ctxt->execute != em_ret_far &&
+ ctxt->execute != em_ret_far_imm &&
+ ctxt->execute != em_sysexit;
+
+ switch (flags & SrcMask) {
+ case SrcReg:
+ case SrcMem:
+ case SrcMem16:
+ case SrcMem32:
+ return true;
+ case SrcMemFAddr:
+ case SrcImmFAddr:
+ /* Far branches should be handled above. */
+ WARN_ON_ONCE(1);
+ return true;
+ case SrcNone:
+ case SrcImm:
+ case SrcImmByte:
+ /*
+ * Note, ImmU16 is used only for the stack adjustment operand on ENTER
+ * and RET instructions. ENTER isn't a branch and RET FAR is handled
+ * by the NearBranch check above. RET itself isn't an indirect branch.
+ */
+ case SrcImmU16:
+ return false;
+ default:
+ WARN_ONCE(1, "Unexpected Src operand '%llx' on branch",
+ flags & SrcMask);
+ return false;
+ }
+}
+
static unsigned imm_size(struct x86_emulate_ctxt *ctxt)
{
unsigned size;
@@ -4943,6 +4998,40 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
ctxt->execute = opcode.u.execute;
+ /*
+ * Reject emulation if KVM might need to emulate shadow stack updates
+ * and/or indirect branch tracking enforcement, which the emulator
+ * doesn't support.
+ */
+ if ((is_ibt_instruction(ctxt) || is_shstk_instruction(ctxt)) &&
+ ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
+ u64 u_cet = 0, s_cet = 0;
+
+ /*
+ * Check both User and Supervisor on far transfers as inter-
+ * privilege level transfers are impacted by CET at the target
+ * privilege level, and that is not known at this time. The
+ * the expectation is that the guest will not require emulation
+ * of any CET-affected instructions at any privilege level.
+ */
+ if (!(ctxt->d & NearBranch))
+ u_cet = s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
+ else if (ctxt->ops->cpl(ctxt) == 3)
+ u_cet = CET_SHSTK_EN | CET_ENDBR_EN;
+ else
+ s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
+
+ if ((u_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet)) ||
+ (s_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet)))
+ return EMULATION_FAILED;
+
+ if ((u_cet | s_cet) & CET_SHSTK_EN && is_shstk_instruction(ctxt))
+ return EMULATION_FAILED;
+
+ if ((u_cet | s_cet) & CET_ENDBR_EN && is_ibt_instruction(ctxt))
+ return EMULATION_FAILED;
+ }
+
if (unlikely(emulation_type & EMULTYPE_TRAP_UD) &&
likely(!(ctxt->d & EmulateOnUD)))
return EMULATION_FAILED;
base-commit: 88539a6a25bc7a7ed96952775152e0c3331fdcaf
--
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v16 02/51] KVM: SEV: Read save fields from GHCB exactly once
2025-09-19 22:32 ` [PATCH v16 02/51] KVM: SEV: Read save fields from GHCB exactly once Sean Christopherson
@ 2025-09-22 21:39 ` Tom Lendacky
0 siblings, 0 replies; 114+ messages in thread
From: Tom Lendacky @ 2025-09-22 21:39 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Mathias Krause, John Allen, Rick Edgecombe,
Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky, Zhang Yi Z,
Xin Li
On 9/19/25 17:32, Sean Christopherson wrote:
> Wrap all reads of GHCB save fields with READ_ONCE() via a KVM-specific
> GHCB get() utility to help guard against TOCTOU bugs. Using READ_ONCE()
> doesn't completely prevent such bugs, e.g. doesn't prevent KVM from
> redoing get() after checking the initial value, but at least addresses
> all potential TOCTOU issues in the current KVM code base.
>
> To prevent unintentional use of the generic helpers, take only @svm for
> the kvm_ghcb_get_xxx() helpers and retrieve the ghcb instead of explicitly
> passing it in.
>
> Opportunistically reduce the indentation of the macro-defined helpers and
> clean up the alignment.
>
> Fixes: 4e15a0ddc3ff ("KVM: SEV: snapshot the GHCB before accessing it")
> Cc: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
> arch/x86/kvm/svm/sev.c | 22 +++++++++++-----------
> arch/x86/kvm/svm/svm.h | 25 +++++++++++++++----------
> 2 files changed, 26 insertions(+), 21 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index f046a587ecaf..8d057dbd8a71 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3343,26 +3343,26 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> BUILD_BUG_ON(sizeof(svm->sev_es.valid_bitmap) != sizeof(ghcb->save.valid_bitmap));
> memcpy(&svm->sev_es.valid_bitmap, &ghcb->save.valid_bitmap, sizeof(ghcb->save.valid_bitmap));
>
> - vcpu->arch.regs[VCPU_REGS_RAX] = kvm_ghcb_get_rax_if_valid(svm, ghcb);
> - vcpu->arch.regs[VCPU_REGS_RBX] = kvm_ghcb_get_rbx_if_valid(svm, ghcb);
> - vcpu->arch.regs[VCPU_REGS_RCX] = kvm_ghcb_get_rcx_if_valid(svm, ghcb);
> - vcpu->arch.regs[VCPU_REGS_RDX] = kvm_ghcb_get_rdx_if_valid(svm, ghcb);
> - vcpu->arch.regs[VCPU_REGS_RSI] = kvm_ghcb_get_rsi_if_valid(svm, ghcb);
> + vcpu->arch.regs[VCPU_REGS_RAX] = kvm_ghcb_get_rax_if_valid(svm);
> + vcpu->arch.regs[VCPU_REGS_RBX] = kvm_ghcb_get_rbx_if_valid(svm);
> + vcpu->arch.regs[VCPU_REGS_RCX] = kvm_ghcb_get_rcx_if_valid(svm);
> + vcpu->arch.regs[VCPU_REGS_RDX] = kvm_ghcb_get_rdx_if_valid(svm);
> + vcpu->arch.regs[VCPU_REGS_RSI] = kvm_ghcb_get_rsi_if_valid(svm);
>
> - svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm, ghcb);
> + svm->vmcb->save.cpl = kvm_ghcb_get_cpl_if_valid(svm);
>
> if (kvm_ghcb_xcr0_is_valid(svm)) {
> - vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb);
> + vcpu->arch.xcr0 = kvm_ghcb_get_xcr0(svm);
> vcpu->arch.cpuid_dynamic_bits_dirty = true;
> }
>
> /* Copy the GHCB exit information into the VMCB fields */
> - exit_code = ghcb_get_sw_exit_code(ghcb);
> + exit_code = kvm_ghcb_get_sw_exit_code(svm);
> control->exit_code = lower_32_bits(exit_code);
> control->exit_code_hi = upper_32_bits(exit_code);
> - control->exit_info_1 = ghcb_get_sw_exit_info_1(ghcb);
> - control->exit_info_2 = ghcb_get_sw_exit_info_2(ghcb);
> - svm->sev_es.sw_scratch = kvm_ghcb_get_sw_scratch_if_valid(svm, ghcb);
> + control->exit_info_1 = kvm_ghcb_get_sw_exit_info_1(svm);
> + control->exit_info_2 = kvm_ghcb_get_sw_exit_info_2(svm);
> + svm->sev_es.sw_scratch = kvm_ghcb_get_sw_scratch_if_valid(svm);
>
> /* Clear the valid entries fields */
> memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 5d39c0b17988..5365984e82e5 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -913,16 +913,21 @@ void __svm_sev_es_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted,
> void __svm_vcpu_run(struct vcpu_svm *svm, bool spec_ctrl_intercepted);
>
> #define DEFINE_KVM_GHCB_ACCESSORS(field) \
> - static __always_inline bool kvm_ghcb_##field##_is_valid(const struct vcpu_svm *svm) \
> - { \
> - return test_bit(GHCB_BITMAP_IDX(field), \
> - (unsigned long *)&svm->sev_es.valid_bitmap); \
> - } \
> - \
> - static __always_inline u64 kvm_ghcb_get_##field##_if_valid(struct vcpu_svm *svm, struct ghcb *ghcb) \
> - { \
> - return kvm_ghcb_##field##_is_valid(svm) ? ghcb->save.field : 0; \
> - } \
> +static __always_inline u64 kvm_ghcb_get_##field(struct vcpu_svm *svm) \
> +{ \
> + return READ_ONCE(svm->sev_es.ghcb->save.field); \
> +} \
> + \
> +static __always_inline bool kvm_ghcb_##field##_is_valid(const struct vcpu_svm *svm) \
> +{ \
> + return test_bit(GHCB_BITMAP_IDX(field), \
> + (unsigned long *)&svm->sev_es.valid_bitmap); \
> +} \
> + \
> +static __always_inline u64 kvm_ghcb_get_##field##_if_valid(struct vcpu_svm *svm) \
> +{ \
> + return kvm_ghcb_##field##_is_valid(svm) ? kvm_ghcb_get_##field(svm) : 0; \
> +}
>
> DEFINE_KVM_GHCB_ACCESSORS(cpl)
> DEFINE_KVM_GHCB_ACCESSORS(rax)
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 29/51] KVM: VMX: Configure nested capabilities after CPU capabilities
2025-09-19 22:32 ` [PATCH v16 29/51] KVM: VMX: Configure nested capabilities after CPU capabilities Sean Christopherson
@ 2025-09-23 2:37 ` Chao Gao
2025-09-23 16:24 ` Sean Christopherson
0 siblings, 1 reply; 114+ messages in thread
From: Chao Gao @ 2025-09-23 2:37 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Fri, Sep 19, 2025 at 03:32:36PM -0700, Sean Christopherson wrote:
>Swap the order between configuring nested VMX capabilities and base CPU
>capabilities, so that nested VMX support can be conditioned on core KVM
>support, e.g. to allow conditioning support for LOAD_CET_STATE on the
>presence of IBT or SHSTK. Because the sanity checks on nested VMX config
>performed by vmx_check_processor_compat() run _after_ vmx_hardware_setup(),
>any use of kvm_cpu_cap_has() when configuring nested VMX support will lead
>to failures in vmx_check_processor_compat().
>
>While swapping the order of two (or more) configuration flows can lead to
>a game of whack-a-mole, in this case nested support inarguably should be
>done after base support. KVM should never condition base support on nested
>support, because nested support is fully optional, while obviously it's
>desirable to condition nested support on base support. And there's zero
>evidence the current ordering was intentional, e.g. commit 66a6950f9995
>("KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking")
>likely placed the call to kvm_set_cpu_caps() after nested setup because it
>looked pretty.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
I had a feeling I'd seen this patch before :). After some searching in lore, I
tracked it down:
https://lore.kernel.org/kvm/20241001050110.3643764-22-xin@zytor.com/
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 34/51] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state
2025-09-19 22:32 ` [PATCH v16 34/51] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state Sean Christopherson
@ 2025-09-23 2:43 ` Chao Gao
2025-09-23 16:28 ` Sean Christopherson
0 siblings, 1 reply; 114+ messages in thread
From: Chao Gao @ 2025-09-23 2:43 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
>Advertise support if and only if KVM supports at least one of IBT or SHSTK.
>While it's userspace's responsibility to provide a consistent CPU model to
>the guest, that doesn't mean KVM should set userspace up to fail.
Makes senes.
>@@ -7178,13 +7178,17 @@ static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf,
> VM_EXIT_HOST_ADDR_SPACE_SIZE |
> #endif
> VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
>- VM_EXIT_CLEAR_BNDCFGS;
>+ VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_CET_STATE;
> msrs->exit_ctls_high |=
> VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
> VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER | VM_EXIT_ACK_INTR_ON_EXIT |
> VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
>
>+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
>+ !kvm_cpu_cap_has(X86_FEATURE_IBT))
>+ msrs->exit_ctls_high &= ~VM_EXIT_LOAD_CET_STATE;
...
>+
> /* We support free control of debug control saving. */
> msrs->exit_ctls_low &= ~VM_EXIT_SAVE_DEBUG_CONTROLS;
> }
>@@ -7200,11 +7204,16 @@ static void nested_vmx_setup_entry_ctls(struct vmcs_config *vmcs_conf,
> #ifdef CONFIG_X86_64
> VM_ENTRY_IA32E_MODE |
> #endif
>- VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS;
>+ VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
>+ VM_ENTRY_LOAD_CET_STATE;
> msrs->entry_ctls_high |=
> (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER |
> VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
>
>+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
>+ !kvm_cpu_cap_has(X86_FEATURE_IBT))
>+ msrs->exit_ctls_high &= ~VM_ENTRY_LOAD_CET_STATE;
one copy-paste error here. s/exit_ctls_high/entry_ctls_high/
>+
> /* We support free control of debug control loading. */
> msrs->entry_ctls_low &= ~VM_ENTRY_LOAD_DEBUG_CONTROLS;
> }
>--
>2.51.0.470.ga7dc726c21-goog
>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 49/51] KVM: selftests: Add coverate for KVM-defined registers in MSRs test
2025-09-19 22:32 ` [PATCH v16 49/51] KVM: selftests: Add coverate for KVM-defined registers in " Sean Christopherson
@ 2025-09-23 6:31 ` Chao Gao
2025-09-23 16:59 ` Sean Christopherson
0 siblings, 1 reply; 114+ messages in thread
From: Chao Gao @ 2025-09-23 6:31 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Fri, Sep 19, 2025 at 03:32:56PM -0700, Sean Christopherson wrote:
>Add test coverage for the KVM-defined GUEST_SSP "register" in the MSRs
>test. While _KVM's_ goal is to not tie the uAPI of KVM-defined registers
>to any particular internal implementation, i.e. to not commit in uAPI to
>handling GUEST_SSP as an MSR, treating GUEST_SSP as an MSR for testing
>purposes is a-ok and is a naturally fit given the semantics of SSP.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
<snip>
>+static bool vcpu_has_reg(struct kvm_vcpu *vcpu, u64 reg)
>+{
>+ struct {
>+ struct kvm_reg_list list;
>+ u64 regs[KVM_X86_MAX_NR_REGS];
>+ } regs = {};
>+ int r, i;
>+
>+ /*
>+ * If KVM_GET_REG_LIST succeeds with n=0, i.e. there are no supported
>+ * regs, then the vCPU obviously doesn't support the reg.
>+ */
>+ r = __vcpu_ioctl(vcpu, KVM_GET_REG_LIST, ®s.list.n);
^^^^^^^^^^^^
it would be more clear to use ®.list here.
>+ if (!r)
>+ return false;
>+
>+ TEST_ASSERT_EQ(errno, E2BIG);
>+
>+ /*
>+ * KVM x86 is expected to support enumerating a relative small number
>+ * of regs. The majority of registers supported by KVM_{G,S}ET_ONE_REG
>+ * are enumerated via other ioctls, e.g. KVM_GET_MSR_INDEX_LIST. For
>+ * simplicity, hardcode the maximum number of regs and manually update
>+ * the test as necessary.
>+ */
>+ TEST_ASSERT(regs.list.n <= KVM_X86_MAX_NR_REGS,
>+ "KVM reports %llu regs, test expects at most %u regs, stale test?",
>+ regs.list.n, KVM_X86_MAX_NR_REGS);
>+
>+ vcpu_ioctl(vcpu, KVM_GET_REG_LIST, ®s.list.n);
Ditto.
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 50/51] KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported
2025-09-19 22:32 ` [PATCH v16 50/51] KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported Sean Christopherson
@ 2025-09-23 6:46 ` Chao Gao
2025-09-23 17:02 ` Sean Christopherson
0 siblings, 1 reply; 114+ messages in thread
From: Chao Gao @ 2025-09-23 6:46 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Fri, Sep 19, 2025 at 03:32:57PM -0700, Sean Christopherson wrote:
>Add a check in the MSRs test to verify that KVM's reported support for
>MSRs with feature bits is consistent between KVM's MSR save/restore lists
>and KVM's supported CPUID.
>
>To deal with Intel's wonderful decision to bundle IBT and SHSTK under CET,
>track the "second" feature to avoid false failures when running on a CPU
>with only one of IBT or SHSTK.
is this paragraph related to this patch? the tracking is done in a previous
patch instead of this patch. So maybe just drop this paragraph.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
>---
> tools/testing/selftests/kvm/x86/msrs_test.c | 22 ++++++++++++++++++++-
> 1 file changed, 21 insertions(+), 1 deletion(-)
>
>diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
>index 7c6d846e42dd..91dc66bfdac2 100644
>--- a/tools/testing/selftests/kvm/x86/msrs_test.c
>+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
>@@ -437,12 +437,32 @@ static void test_msrs(void)
> }
>
> for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) {
>- if (msrs[idx].is_kvm_defined) {
>+ struct kvm_msr *msr = &msrs[idx];
>+
>+ if (msr->is_kvm_defined) {
> for (i = 0; i < NR_VCPUS; i++)
> host_test_kvm_reg(vcpus[i]);
> continue;
> }
>
>+ /*
>+ * Verify KVM_GET_SUPPORTED_CPUID and KVM_GET_MSR_INDEX_LIST
>+ * are consistent with respect to MSRs whose existence is
>+ * enumerated via CPUID. Note, using LM as a dummy feature
>+ * is a-ok here as well, as all MSRs that abuse LM should be
>+ * unconditionally reported in the save/restore list (and
I am not sure why LM is mentioned here. Is it a leftover from one of your
previous attempts?
>+ * selftests are 64-bit only). Note #2, skip the check for
>+ * FS/GS.base MSRs, as they aren't reported in the save/restore
>+ * list since their state is managed via SREGS.
>+ */
>+ TEST_ASSERT(msr->index == MSR_FS_BASE || msr->index == MSR_GS_BASE ||
>+ kvm_msr_is_in_save_restore_list(msr->index) ==
>+ (kvm_cpu_has(msr->feature) || kvm_cpu_has(msr->feature2)),
>+ "%s %s save/restore list, but %s according to CPUID", msr->name,
^ an "in" is missing here.
The code change looks good. So,
Reviewed-by: Chao Gao <chao.gao@intel.com>
>+ kvm_msr_is_in_save_restore_list(msr->index) ? "is" : "isn't",
>+ (kvm_cpu_has(msr->feature) || kvm_cpu_has(msr->feature2)) ?
>+ "supported" : "unsupported");
>+
> sync_global_to_guest(vm, idx);
>
> vcpus_run(vcpus, NR_VCPUS);
>--
>2.51.0.470.ga7dc726c21-goog
>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 48/51] KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
2025-09-19 22:32 ` [PATCH v16 48/51] KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test Sean Christopherson
@ 2025-09-23 6:52 ` Chao Gao
0 siblings, 0 replies; 114+ messages in thread
From: Chao Gao @ 2025-09-23 6:52 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Fri, Sep 19, 2025 at 03:32:55PM -0700, Sean Christopherson wrote:
>When KVM_{G,S}ET_ONE_REG are supported, verify that MSRs can be accessed
>via ONE_REG and through the dedicated MSR ioctls. For simplicity, run
>the test twice, e.g. instead of trying to get MSR values into the exact
>right state when switching write methods.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 46/51] KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
2025-09-19 22:32 ` [PATCH v16 46/51] KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test Sean Christopherson
@ 2025-09-23 7:12 ` Chao Gao
0 siblings, 0 replies; 114+ messages in thread
From: Chao Gao @ 2025-09-23 7:12 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Fri, Sep 19, 2025 at 03:32:53PM -0700, Sean Christopherson wrote:
>Extend the MSRs test to support {S,U}_CET, which are a bit of a pain to
>handled due to the MSRs existing if IBT *or* SHSTK is supported. To deal
>with Intel's wonderful decision to bundle IBT and SHSTK under CET, track
>the second feature, but skip only RDMSR #GP tests to avoid false failures
>when running on a CPU with only one of IBT or SHSTK (the WRMSR #GP tests
>are still valid since the enable bits are per-feature).
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 45/51] KVM: selftests: Add an MSR test to exercise guest/host and read/write
2025-09-19 22:32 ` [PATCH v16 45/51] KVM: selftests: Add an MSR test to exercise guest/host and read/write Sean Christopherson
@ 2025-09-23 8:03 ` Chao Gao
2025-09-23 16:51 ` Sean Christopherson
0 siblings, 1 reply; 114+ messages in thread
From: Chao Gao @ 2025-09-23 8:03 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Fri, Sep 19, 2025 at 03:32:52PM -0700, Sean Christopherson wrote:
>Add a selftest to verify reads and writes to various MSRs, from both the
>guest and host, and expect success/failure based on whether or not the
>vCPU supports the MSR according to supported CPUID.
>
>Note, this test is extremely similar to KVM-Unit-Test's "msr" test, but
>provides more coverage with respect to host accesses, and will be extended
>to provide addition testing of CPUID-based features, save/restore lists,
>and KVM_{G,S}ET_ONE_REG, all which are extremely difficult to validate in
>KUT.
>
>If kvm.ignore_msrs=true, skip the unsupported and reserved testcases as
>KVM's ABI is a mess; what exactly is supposed to be ignored, and when,
>varies wildly.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
<snip>
>+/*
>+ * Note, use a page aligned value for the canonical value so that the value
>+ * is compatible with MSRs that use bits 11:0 for things other than addresses.
>+ */
>+static const u64 canonical_val = 0x123456789000ull;
...
>+{
>+ const struct kvm_msr __msrs[] = {
>+ MSR_TEST_NON_ZERO(MSR_IA32_MISC_ENABLE,
>+ MISC_ENABLES_RESET_VAL | MSR_IA32_MISC_ENABLE_FAST_STRING,
>+ MSR_IA32_MISC_ENABLE_FAST_STRING, MISC_ENABLES_RESET_VAL, NONE),
>+ MSR_TEST_NON_ZERO(MSR_IA32_CR_PAT, 0x07070707, 0, 0x7040600070406, NONE),
>+
>+ /*
>+ * TSC_AUX is supported if RDTSCP *or* RDPID is supported. Add
>+ * entries for each features so that TSC_AUX doesn't exists for
>+ * the "unsupported" vCPU, and obviously to test both cases.
>+ */
>+ MSR_TEST2(MSR_TSC_AUX, 0x12345678, canonical_val, RDTSCP, RDPID),
>+ MSR_TEST2(MSR_TSC_AUX, 0x12345678, canonical_val, RDPID, RDTSCP),
At first glance, it's unclear to me why canonical_val is invalid for
MSR_TSC_AUX, especially since it is valid for a few other MSRs in this
test. Should we add a note to the above comment? e.g.,
canonical_val is invalid for MSR_TSC_AUX because its high 32 bits must be 0.
>+
>+ MSR_TEST(MSR_IA32_SYSENTER_CS, 0x1234, 0, NONE),
>+ /*
>+ * SYSENTER_{ESP,EIP} are technically non-canonical on Intel,
>+ * but KVM doesn't emulate that behavior on emulated writes,
>+ * i.e. this test will observe different behavior if the MSR
>+ * writes are handed by hardware vs. KVM. KVM's behavior is
>+ * intended (though far from ideal), so don't bother testing
>+ * non-canonical values.
>+ */
>+ MSR_TEST(MSR_IA32_SYSENTER_ESP, canonical_val, 0, NONE),
>+ MSR_TEST(MSR_IA32_SYSENTER_EIP, canonical_val, 0, NONE),
>+
>+ MSR_TEST_CANONICAL(MSR_FS_BASE, LM),
>+ MSR_TEST_CANONICAL(MSR_GS_BASE, LM),
>+ MSR_TEST_CANONICAL(MSR_KERNEL_GS_BASE, LM),
>+ MSR_TEST_CANONICAL(MSR_LSTAR, LM),
>+ MSR_TEST_CANONICAL(MSR_CSTAR, LM),
>+ MSR_TEST(MSR_SYSCALL_MASK, 0xffffffff, 0, LM),
>+
>+ MSR_TEST_CANONICAL(MSR_IA32_PL0_SSP, SHSTK),
>+ MSR_TEST(MSR_IA32_PL0_SSP, canonical_val, canonical_val | 1, SHSTK),
>+ MSR_TEST_CANONICAL(MSR_IA32_PL1_SSP, SHSTK),
>+ MSR_TEST(MSR_IA32_PL1_SSP, canonical_val, canonical_val | 1, SHSTK),
>+ MSR_TEST_CANONICAL(MSR_IA32_PL2_SSP, SHSTK),
>+ MSR_TEST(MSR_IA32_PL2_SSP, canonical_val, canonical_val | 1, SHSTK),
>+ MSR_TEST_CANONICAL(MSR_IA32_PL3_SSP, SHSTK),
>+ MSR_TEST(MSR_IA32_PL3_SSP, canonical_val, canonical_val | 1, SHSTK),
>+ };
>+
>+ /*
>+ * Create two vCPUs, but run them on the same task, to validate KVM's
>+ * context switching of MSR state. Don't pin the task to a pCPU to
>+ * also validate KVM's handling of cross-pCPU migration.
>+ */
>+ const int NR_VCPUS = 2;
>+ struct kvm_vcpu *vcpus[NR_VCPUS];
>+ struct kvm_vm *vm;
>+
>+ kvm_static_assert(sizeof(__msrs) <= sizeof(msrs));
>+ kvm_static_assert(ARRAY_SIZE(__msrs) <= ARRAY_SIZE(msrs));
>+ memcpy(msrs, __msrs, sizeof(__msrs));
>+
>+ ignore_unsupported_msrs = kvm_is_ignore_msrs();
>+
>+ vm = vm_create_with_vcpus(NR_VCPUS, guest_main, vcpus);
>+
>+ sync_global_to_guest(vm, msrs);
>+ sync_global_to_guest(vm, ignore_unsupported_msrs);
>+
>+ for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) {
>+ sync_global_to_guest(vm, idx);
>+
>+ vcpus_run(vcpus, NR_VCPUS);
>+ vcpus_run(vcpus, NR_VCPUS);
>+ }
>+
>+ kvm_vm_free(vm);
>+}
>+
>+int main(void)
>+{
>+ test_msrs();
>+}
>--
>2.51.0.470.ga7dc726c21-goog
>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 24/51] KVM: nVMX: Always forward XSAVES/XRSTORS exits from L2 to L1
2025-09-19 22:32 ` [PATCH v16 24/51] KVM: nVMX: Always forward XSAVES/XRSTORS exits from L2 to L1 Sean Christopherson
@ 2025-09-23 8:15 ` Chao Gao
2025-09-23 14:49 ` Xiaoyao Li
1 sibling, 0 replies; 114+ messages in thread
From: Chao Gao @ 2025-09-23 8:15 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Fri, Sep 19, 2025 at 03:32:31PM -0700, Sean Christopherson wrote:
>Unconditionally forward XSAVES/XRSTORS VM-Exits from L2 to L1, as KVM
>doesn't utilize the XSS-bitmap (KVM relies on controlling the XSS value
>in hardware to prevent unauthorized access to XSAVES state). KVM always
>loads vmcs02 with vmcs12's bitmap, and so any exit _must_ be due to
>vmcs12's XSS-bitmap.
>
>Drop the comment about XSS never being non-zero in anticipation of
>enabling CET_KERNEL and CET_USER support.
>
>Opportunistically WARN if XSAVES is not enabled for L2, as the CPU is
>supposed to generate #UD before checking the XSS-bitmap.
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 13/51] KVM: x86: Enable guest SSP read/write interface with new uAPIs
2025-09-19 22:32 ` [PATCH v16 13/51] KVM: x86: Enable guest SSP read/write interface with new uAPIs Sean Christopherson
2025-09-22 2:58 ` Binbin Wu
@ 2025-09-23 9:06 ` Xiaoyao Li
1 sibling, 0 replies; 114+ messages in thread
From: Xiaoyao Li @ 2025-09-23 9:06 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Maxim Levitsky, Zhang Yi Z,
Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> From: Yang Weijiang<weijiang.yang@intel.com>
>
> Add a KVM-defined ONE_REG register, KVM_REG_GUEST_SSP, to let userspace
> save and restore the guest's Shadow Stack Pointer (SSP). On both Intel
> and AMD, SSP is a hardware register that can only be accessed by software
> via dedicated ISA (e.g. RDSSP) or via VMCS/VMCB fields (used by hardware
> to context switch SSP at entry/exit). As a result, SSP doesn't fit in
> any of KVM's existing interfaces for saving/restoring state.
>
> Internally, treat SSP as a fake/synthetic MSR, as the semantics of writes
> to SSP follow that of several other Shadow Stack MSRs, e.g. the PLx_SSP
> MSRs. Use a translation layer to hide the KVM-internal MSR index so that
> the arbitrary index doesn't become ABI, e.g. so that KVM can rework its
> implementation as needed, so long as the ONE_REG ABI is maintained.
>
> Explicitly reject accesses to SSP if the vCPU doesn't have Shadow Stack
> support to avoid running afoul of ignore_msrs, which unfortunately applies
> to host-initiated accesses (which is a discussion for another day). I.e.
> ensure consistent behavior for KVM-defined registers irrespective of
> ignore_msrs.
>
> Link:https://lore.kernel.org/all/aca9d389-f11e-4811-90cf-d98e345a5cc2@intel.com
> Suggested-by: Sean Christopherson<seanjc@google.com>
> Signed-off-by: Yang Weijiang<weijiang.yang@intel.com>
> Tested-by: Mathias Krause<minipli@grsecurity.net>
> Tested-by: John Allen<john.allen@amd.com>
> Tested-by: Rick Edgecombe<rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao<chao.gao@intel.com>
> Co-developed-by: Sean Christopherson<seanjc@google.com>
> Signed-off-by: Sean Christopherson<seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 18/51] KVM: x86: Don't emulate instructions affected by CET features
2025-09-22 20:04 ` Sean Christopherson
@ 2025-09-23 14:12 ` Xiaoyao Li
2025-09-23 16:15 ` Sean Christopherson
0 siblings, 1 reply; 114+ messages in thread
From: Xiaoyao Li @ 2025-09-23 14:12 UTC (permalink / raw)
To: Sean Christopherson, Chao Gao
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Maxim Levitsky, Zhang Yi Z,
Xin Li
On 9/23/2025 4:04 AM, Sean Christopherson wrote:
> From: Sean Christopherson<seanjc@google.com>
> Date: Fri, 19 Sep 2025 15:32:25 -0700
> Subject: [PATCH] KVM: x86: Don't emulate instructions affected by CET features
>
> Don't emulate branch instructions, e.g. CALL/RET/JMP etc., that are
> affected by Shadow Stacks and/or Indirect Branch Tracking when said
> features are enabled in the guest, as fully emulating CET would require
> significant complexity for no practical benefit (KVM shouldn't need to
> emulate branch instructions on modern hosts). Simply doing nothing isn't
> an option as that would allow a malicious entity to subvert CET
> protections via the emulator.
>
> To detect instructions that are subject to IBT or affect IBT state, use
> the existing IsBranch flag along with the source operand type to detect
> indirect branches, and the existing NearBranch flag to detect far JMPs
> and CALLs, all of which are effectively indirect. Explicitly check for
> emulation of IRET, FAR RET (IMM), and SYSEXIT (the ret-like far branches)
> instead of adding another flag, e.g. IsRet, as it's unlikely the emulator
> will ever need to check for return-like instructions outside of this one
> specific flow. Use an allow-list instead of a deny-list because (a) it's
> a shorter list and (b) so that a missed entry gets a false positive, not a
> false negative (i.e. reject emulation instead of clobbering CET state).
>
> For Shadow Stacks, explicitly track instructions that directly affect the
> current SSP, as KVM's emulator doesn't have existing flags that can be
> used to precisely detect such instructions. Alternatively, the em_xxx()
> helpers could directly check for ShadowStack interactions, but using a
> dedicated flag is arguably easier to audit, and allows for handling both
> IBT and SHSTK in one fell swoop.
>
> Note! On far transfers, do NOT consult the current privilege level and
> instead treat SHSTK/IBT as being enabled if they're enabled for User*or*
> Supervisor mode. On inter-privilege level far transfers, SHSTK and IBT
> can be in play for the target privilege level, i.e. checking the current
> privilege could get a false negative, and KVM doesn't know the target
> privilege level until emulation gets under way.
>
> Note #2, FAR JMP from 64-bit mode to compatibility mode interacts with
> the current SSP, but only to ensure SSP[63:32] == 0. Don't tag FAR JMP
> as SHSTK, which would be rather confusing and would result in FAR JMP
> being rejected unnecessarily the vast majority of the time (ignoring that
> it's unlikely to ever be emulated). A future commit will add the #GP(0)
> check for the specific FAR JMP scenario.
>
> Note #3, task switches also modify SSP and so need to be rejected. That
> too will be addressed in a future commit.
>
> Suggested-by: Chao Gao<chao.gao@intel.com>
> Originally-by: Yang Weijiang<weijiang.yang@intel.com>
> Cc: Mathias Krause<minipli@grsecurity.net>
> Cc: John Allen<john.allen@amd.com>
> Cc: Rick Edgecombe<rick.p.edgecombe@intel.com>
> Reviewed-by: Chao Gao<chao.gao@intel.com>
> Reviewed-by: Binbin Wu<binbin.wu@linux.intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Two nits besides,
> Link:https://lore.kernel.org/r/20250919223258.1604852-19-seanjc@google.com
> Signed-off-by: Sean Christopherson<seanjc@google.com>
> ---
> arch/x86/kvm/emulate.c | 117 ++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 103 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 23929151a5b8..a7683dc18405 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -178,6 +178,7 @@
> #define IncSP ((u64)1 << 54) /* SP is incremented before ModRM calc */
> #define TwoMemOp ((u64)1 << 55) /* Instruction has two memory operand */
> #define IsBranch ((u64)1 << 56) /* Instruction is considered a branch. */
> +#define ShadowStack ((u64)1 << 57) /* Instruction affects Shadow Stacks. */
>
> #define DstXacc (DstAccLo | SrcAccHi | SrcWrite)
>
> @@ -4068,9 +4069,9 @@ static const struct opcode group4[] = {
> static const struct opcode group5[] = {
> F(DstMem | SrcNone | Lock, em_inc),
> F(DstMem | SrcNone | Lock, em_dec),
> - I(SrcMem | NearBranch | IsBranch, em_call_near_abs),
> - I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far),
> - I(SrcMem | NearBranch | IsBranch, em_jmp_abs),
> + I(SrcMem | NearBranch | IsBranch | ShadowStack, em_call_near_abs),
> + I(SrcMemFAddr | ImplicitOps | IsBranch | ShadowStack, em_call_far),
> + I(SrcMem | NearBranch | IsBranch, em_jmp_abs),
The change of this line is unexpected, since it only changes the
indentation of 'em_jmp_abs'
> static unsigned imm_size(struct x86_emulate_ctxt *ctxt)
> {
> unsigned size;
> @@ -4943,6 +4998,40 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
>
> ctxt->execute = opcode.u.execute;
>
> + /*
> + * Reject emulation if KVM might need to emulate shadow stack updates
> + * and/or indirect branch tracking enforcement, which the emulator
> + * doesn't support.
> + */
> + if ((is_ibt_instruction(ctxt) || is_shstk_instruction(ctxt)) &&
> + ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
> + u64 u_cet = 0, s_cet = 0;
> +
> + /*
> + * Check both User and Supervisor on far transfers as inter-
> + * privilege level transfers are impacted by CET at the target
> + * privilege level, and that is not known at this time. The
> + * the expectation is that the guest will not require emulation
Dobule 'the'
> + * of any CET-affected instructions at any privilege level.
> + */
> + if (!(ctxt->d & NearBranch))
> + u_cet = s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
> + else if (ctxt->ops->cpl(ctxt) == 3)
> + u_cet = CET_SHSTK_EN | CET_ENDBR_EN;
> + else
> + s_cet = CET_SHSTK_EN | CET_ENDBR_EN;
> +
> + if ((u_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_U_CET, &u_cet)) ||
> + (s_cet && ctxt->ops->get_msr(ctxt, MSR_IA32_S_CET, &s_cet)))
> + return EMULATION_FAILED;
> +
> + if ((u_cet | s_cet) & CET_SHSTK_EN && is_shstk_instruction(ctxt))
> + return EMULATION_FAILED;
> +
> + if ((u_cet | s_cet) & CET_ENDBR_EN && is_ibt_instruction(ctxt))
> + return EMULATION_FAILED;
> + }
> +
> if (unlikely(emulation_type & EMULTYPE_TRAP_UD) &&
> likely(!(ctxt->d & EmulateOnUD)))
> return EMULATION_FAILED;
>
> base-commit: 88539a6a25bc7a7ed96952775152e0c3331fdcaf
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 19/51] KVM: x86: Don't emulate task switches when IBT or SHSTK is enabled
2025-09-22 17:23 ` Sean Christopherson
@ 2025-09-23 14:16 ` Xiaoyao Li
0 siblings, 0 replies; 114+ messages in thread
From: Xiaoyao Li @ 2025-09-23 14:16 UTC (permalink / raw)
To: Sean Christopherson, Binbin Wu
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z,
Xin Li
On 9/23/2025 1:23 AM, Sean Christopherson wrote:
> On Mon, Sep 22, 2025, Binbin Wu wrote:
>>
>>
>> On 9/20/2025 6:32 AM, Sean Christopherson wrote:
>>> Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if the guest triggers
>>> task switch emulation with Indirect Branch Tracking or Shadow Stacks
>>> enabled,
>>
>> The code just does it when shadow stack is enabled.
>
> Doh. Fixed that and the EMULATION_FAILED typo Chao pointed out:
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8b31dfcb1de9..06a88a2b08d7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12194,9 +12194,9 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
> */
> if (__kvm_emulate_msr_read(vcpu, MSR_IA32_U_CET, &u_cet) ||
> __kvm_emulate_msr_read(vcpu, MSR_IA32_S_CET, &s_cet))
> - return EMULATION_FAILED;
> + goto unhandled_task_switch;
>
> - if ((u_cet | s_cet) & CET_SHSTK_EN)
> + if ((u_cet | s_cet) & (CET_ENDBR_EN | CET_SHSTK_EN))
> goto unhandled_task_switch;
> }
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 20/51] KVM: x86: Emulate SSP[63:32]!=0 #GP(0) for FAR JMP to 32-bit mode
2025-09-19 22:32 ` [PATCH v16 20/51] KVM: x86: Emulate SSP[63:32]!=0 #GP(0) for FAR JMP to 32-bit mode Sean Christopherson
2025-09-22 7:15 ` Binbin Wu
@ 2025-09-23 14:29 ` Xiaoyao Li
1 sibling, 0 replies; 114+ messages in thread
From: Xiaoyao Li @ 2025-09-23 14:29 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Maxim Levitsky, Zhang Yi Z,
Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> Emulate the Shadow Stack restriction that the current SSP must be a 32-bit
> value on a FAR JMP from 64-bit mode to compatibility mode. From the SDM's
> pseudocode for FAR JMP:
>
> IF ShadowStackEnabled(CPL)
> IF (IA32_EFER.LMA and DEST(segment selector).L) = 0
> (* If target is legacy or compatibility mode then the SSP must be in low 4GB *)
> IF (SSP & 0xFFFFFFFF00000000 != 0); THEN
> #GP(0);
> FI;
> FI;
> FI;
>
> Note, only the current CPL needs to be considered, as FAR JMP can't be
> used for inter-privilege level transfers, and KVM rejects emulation of all
> other far branch instructions when Shadow Stacks are enabled.
>
> To give the emulator access to GUEST_SSP, special case handling
> MSR_KVM_INTERNAL_GUEST_SSP in emulator_get_msr() to treat the access as a
> host access (KVM doesn't allow guest accesses to internal "MSRs"). The
> ->get_msr() API is only used for implicit accesses from the emulator, i.e.
> is only used with hardcoded MSR indices, and so any access to
> MSR_KVM_INTERNAL_GUEST_SSP is guaranteed to be from KVM, i.e. not from the
> guest via RDMSR.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 21/51] KVM: x86/mmu: WARN on attempt to check permissions for Shadow Stack #PF
2025-09-22 7:46 ` Binbin Wu
@ 2025-09-23 14:33 ` Xiaoyao Li
0 siblings, 0 replies; 114+ messages in thread
From: Xiaoyao Li @ 2025-09-23 14:33 UTC (permalink / raw)
To: Binbin Wu, Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Maxim Levitsky, Zhang Yi Z,
Xin Li
On 9/22/2025 3:46 PM, Binbin Wu wrote:
>
>
> On 9/22/2025 3:17 PM, Binbin Wu wrote:
>>
>>
>> On 9/20/2025 6:32 AM, Sean Christopherson wrote:
>>> Add PFERR_SS_MASK, a.k.a. Shadow Stack access, and WARN if KVM
>>> attempts to
>>> check permissions for a Shadow Stack access as KVM hasn't been taught to
>>> understand the magic Writable=0,Dirty=0 combination that is required for
> Typo:
>
> Writable=0,Dirty=0 -> Writable=0,Dirty=1
With it fixed,
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>> Shadow Stack accesses, and likely will never learn. There are no
>>> plans to
>>> support Shadow Stacks with the Shadow MMU, and the emulator rejects all
>>> instructions that affect Shadow Stacks, i.e. it should be impossible for
>>> KVM to observe a #PF due to a shadow stack access.
>>>
>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>
>> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 27/51] KVM: x86: Disable support for IBT and SHSTK if allow_smaller_maxphyaddr is true
2025-09-19 22:32 ` [PATCH v16 27/51] KVM: x86: Disable support for IBT and SHSTK if allow_smaller_maxphyaddr is true Sean Christopherson
2025-09-22 8:00 ` Binbin Wu
2025-09-22 18:40 ` Sean Christopherson
@ 2025-09-23 14:44 ` Xiaoyao Li
2025-09-23 15:04 ` Sean Christopherson
2 siblings, 1 reply; 114+ messages in thread
From: Xiaoyao Li @ 2025-09-23 14:44 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Maxim Levitsky, Zhang Yi Z,
Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> Make IBT and SHSTK virtualization mutually exclusive with "officially"
> supporting setups with guest.MAXPHYADDR < host.MAXPHYADDR, i.e. if the
> allow_smaller_maxphyaddr module param is set. Running a guest with a
> smaller MAXPHYADDR requires intercepting #PF, and can also trigger
> emulation of arbitrary instructions. Intercepting and reacting to #PFs
> doesn't play nice with SHSTK, as KVM's MMU hasn't been taught to handle
> Shadow Stack accesses, and emulating arbitrary instructions doesn't play
> nice with IBT or SHSTK, as KVM's emulator doesn't handle the various side
> effects, e.g. doesn't enforce end-branch markers or model Shadow Stack
> updates.
>
> Note, hiding IBT and SHSTK based solely on allow_smaller_maxphyaddr is
> overkill, as allow_smaller_maxphyaddr is only problematic if the guest is
> actually configured to have a smaller MAXPHYADDR. However, KVM's ABI
> doesn't provide a way to express that IBT and SHSTK may break if enabled
> in conjunction with guest.MAXPHYADDR < host.MAXPHYADDR. I.e. the
> alternative is to do nothing in KVM and instead update documentation and
> hope KVM users are thorough readers.
KVM_SET_CPUID* can return error to userspace. So KVM can return -EINVAL
when userspace sets a smaller maxphyaddr with SHSTK/IBT enabled.
> Go with the conservative-but-correct
> approach; worst case scenario, this restriction can be dropped if there's
> a strong use case for enabling CET on hosts with allow_smaller_maxphyaddr.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/cpuid.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 499c86bd457e..b5c4cb13630c 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -963,6 +963,16 @@ void kvm_set_cpu_caps(void)
> if (!tdp_enabled)
> kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
>
> + /*
> + * Disable support for IBT and SHSTK if KVM is configured to emulate
> + * accesses to reserved GPAs, as KVM's emulator doesn't support IBT or
> + * SHSTK, nor does KVM handle Shadow Stack #PFs (see above).
> + */
> + if (allow_smaller_maxphyaddr) {
> + kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
> + kvm_cpu_cap_clear(X86_FEATURE_IBT);
> + }
> +
> kvm_cpu_cap_init(CPUID_7_EDX,
> F(AVX512_4VNNIW),
> F(AVX512_4FMAPS),
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 22/51] KVM: x86/mmu: Pretty print PK, SS, and SGX flags in MMU tracepoints
2025-09-19 22:32 ` [PATCH v16 22/51] KVM: x86/mmu: Pretty print PK, SS, and SGX flags in MMU tracepoints Sean Christopherson
2025-09-22 7:18 ` Binbin Wu
@ 2025-09-23 14:46 ` Xiaoyao Li
1 sibling, 0 replies; 114+ messages in thread
From: Xiaoyao Li @ 2025-09-23 14:46 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Maxim Levitsky, Zhang Yi Z,
Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> Add PK (Protection Keys), SS (Shadow Stacks), and SGX (Software Guard
> Extensions) to the set of #PF error flags handled via
> kvm_mmu_trace_pferr_flags. While KVM doesn't expect PK or SS #PFs in
> particular, pretty print their names instead of the raw hex value saves
> the user from having to go spelunking in the SDM to figure out what's
> going on.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> arch/x86/kvm/mmu/mmutrace.h | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
> index f35a830ce469..764e3015d021 100644
> --- a/arch/x86/kvm/mmu/mmutrace.h
> +++ b/arch/x86/kvm/mmu/mmutrace.h
> @@ -51,6 +51,9 @@
> { PFERR_PRESENT_MASK, "P" }, \
> { PFERR_WRITE_MASK, "W" }, \
> { PFERR_USER_MASK, "U" }, \
> + { PFERR_PK_MASK, "PK" }, \
> + { PFERR_SS_MASK, "SS" }, \
> + { PFERR_SGX_MASK, "SGX" }, \
> { PFERR_RSVD_MASK, "RSVD" }, \
> { PFERR_FETCH_MASK, "F" }
>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 23/51] KVM: x86: Allow setting CR4.CET if IBT or SHSTK is supported
2025-09-19 22:32 ` [PATCH v16 23/51] KVM: x86: Allow setting CR4.CET if IBT or SHSTK is supported Sean Christopherson
2025-09-22 7:25 ` Binbin Wu
@ 2025-09-23 14:46 ` Xiaoyao Li
1 sibling, 0 replies; 114+ messages in thread
From: Xiaoyao Li @ 2025-09-23 14:46 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Maxim Levitsky, Zhang Yi Z,
Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Drop X86_CR4_CET from CR4_RESERVED_BITS and instead mark CET as reserved
> if and only if IBT *and* SHSTK are unsupported, i.e. allow CR4.CET to be
> set if IBT or SHSTK is supported. This creates a virtualization hole if
> the CPU supports both IBT and SHSTK, but the kernel or vCPU model only
> supports one of the features. However, it's entirely legal for a CPU to
> have only one of IBT or SHSTK, i.e. the hole is a flaw in the architecture,
> not in KVM.
>
> More importantly, so long as KVM is careful to initialize and context
> switch both IBT and SHSTK state (when supported in hardware) if either
> feature is exposed to the guest, a misbehaving guest can only harm itself.
> E.g. VMX initializes host CET VMCS fields based solely on hardware
> capabilities.
>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> [sean: split to separate patch, write changelog]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/x86.h | 3 +++
> 2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 554d83ff6135..39231da3a3ff 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -142,7 +142,7 @@
> | X86_CR4_OSXSAVE | X86_CR4_SMEP | X86_CR4_FSGSBASE \
> | X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_VMXE \
> | X86_CR4_SMAP | X86_CR4_PKE | X86_CR4_UMIP \
> - | X86_CR4_LAM_SUP))
> + | X86_CR4_LAM_SUP | X86_CR4_CET))
>
> #define CR8_RESERVED_BITS (~(unsigned long)X86_CR8_TPR)
>
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 65cbd454c4f1..f3dc77f006f9 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -680,6 +680,9 @@ static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> __reserved_bits |= X86_CR4_PCIDE; \
> if (!__cpu_has(__c, X86_FEATURE_LAM)) \
> __reserved_bits |= X86_CR4_LAM_SUP; \
> + if (!__cpu_has(__c, X86_FEATURE_SHSTK) && \
> + !__cpu_has(__c, X86_FEATURE_IBT)) \
> + __reserved_bits |= X86_CR4_CET; \
> __reserved_bits; \
> })
>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 24/51] KVM: nVMX: Always forward XSAVES/XRSTORS exits from L2 to L1
2025-09-19 22:32 ` [PATCH v16 24/51] KVM: nVMX: Always forward XSAVES/XRSTORS exits from L2 to L1 Sean Christopherson
2025-09-23 8:15 ` Chao Gao
@ 2025-09-23 14:49 ` Xiaoyao Li
1 sibling, 0 replies; 114+ messages in thread
From: Xiaoyao Li @ 2025-09-23 14:49 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Maxim Levitsky, Zhang Yi Z,
Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> Unconditionally forward XSAVES/XRSTORS VM-Exits from L2 to L1, as KVM
> doesn't utilize the XSS-bitmap (KVM relies on controlling the XSS value
> in hardware to prevent unauthorized access to XSAVES state). KVM always
> loads vmcs02 with vmcs12's bitmap, and so any exit _must_ be due to
> vmcs12's XSS-bitmap.
>
> Drop the comment about XSS never being non-zero in anticipation of
> enabling CET_KERNEL and CET_USER support.
>
> Opportunistically WARN if XSAVES is not enabled for L2, as the CPU is
> supposed to generate #UD before checking the XSS-bitmap.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> arch/x86/kvm/vmx/nested.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 2156c9a854f4..846c07380eac 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -6570,14 +6570,17 @@ static bool nested_vmx_l1_wants_exit(struct kvm_vcpu *vcpu,
> return nested_cpu_has2(vmcs12, SECONDARY_EXEC_WBINVD_EXITING);
> case EXIT_REASON_XSETBV:
> return true;
> - case EXIT_REASON_XSAVES: case EXIT_REASON_XRSTORS:
> + case EXIT_REASON_XSAVES:
> + case EXIT_REASON_XRSTORS:
> /*
> - * This should never happen, since it is not possible to
> - * set XSS to a non-zero value---neither in L1 nor in L2.
> - * If if it were, XSS would have to be checked against
> - * the XSS exit bitmap in vmcs12.
> + * Always forward XSAVES/XRSTORS to L1 as KVM doesn't utilize
> + * XSS-bitmap, and always loads vmcs02 with vmcs12's XSS-bitmap
> + * verbatim, i.e. any exit is due to L1's bitmap. WARN if
> + * XSAVES isn't enabled, as the CPU is supposed to inject #UD
> + * in that case, before consulting the XSS-bitmap.
> */
> - return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_XSAVES);
> + WARN_ON_ONCE(!nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_XSAVES));
> + return true;
> case EXIT_REASON_UMWAIT:
> case EXIT_REASON_TPAUSE:
> return nested_cpu_has2(vmcs12,
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 25/51] KVM: x86: Add XSS support for CET_KERNEL and CET_USER
2025-09-19 22:32 ` [PATCH v16 25/51] KVM: x86: Add XSS support for CET_KERNEL and CET_USER Sean Christopherson
2025-09-22 7:31 ` Binbin Wu
@ 2025-09-23 14:55 ` Xiaoyao Li
1 sibling, 0 replies; 114+ messages in thread
From: Xiaoyao Li @ 2025-09-23 14:55 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Maxim Levitsky, Zhang Yi Z,
Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Add CET_KERNEL and CET_USER to KVM's set of supported XSS bits when IBT
> *or* SHSTK is supported. Like CR4.CET, XFEATURE support for IBT and SHSTK
> are bundle together under the CET umbrella, and thus prone to
> virtualization holes if KVM or the guest supports only one of IBT or SHSTK,
> but hardware supports both. However, again like CR4.CET, such
> virtualization holes are benign from the host's perspective so long as KVM
> takes care to always honor the "or" logic.
>
> Require CET_KERNEL and CET_USER to come as a pair, and refuse to support
> IBT or SHSTK if one (or both) features is missing, as the (host) kernel
> expects them to come as a pair, i.e. may get confused and corrupt state if
> only one of CET_KERNEL or CET_USER is supported.
>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> [sean: split to separate patch, write changelog, add XFEATURE_MASK_CET_ALL]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> arch/x86/kvm/x86.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 40596fc5142e..4a0ff0403bb2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -220,13 +220,14 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
> | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
> | XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
>
> +#define XFEATURE_MASK_CET_ALL (XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL)
> /*
> * Note, KVM supports exposing PT to the guest, but does not support context
> * switching PT via XSTATE (KVM's PT virtualization relies on perf; swapping
> * PT via guest XSTATE would clobber perf state), i.e. KVM doesn't support
> * IA32_XSS[bit 8] (guests can/must use RDMSR/WRMSR to save/restore PT MSRs).
> */
> -#define KVM_SUPPORTED_XSS 0
> +#define KVM_SUPPORTED_XSS (XFEATURE_MASK_CET_ALL)
>
> bool __read_mostly allow_smaller_maxphyaddr = 0;
> EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);
> @@ -10104,6 +10105,16 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
> if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
> kvm_caps.supported_xss = 0;
>
> + if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> + !kvm_cpu_cap_has(X86_FEATURE_IBT))
> + kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_ALL;
> +
> + if ((kvm_caps.supported_xss & XFEATURE_MASK_CET_ALL) != XFEATURE_MASK_CET_ALL) {
> + kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
> + kvm_cpu_cap_clear(X86_FEATURE_IBT);
> + kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_ALL;
> + }
> +
> if (kvm_caps.has_tsc_control) {
> /*
> * Make sure the user can only configure tsc_khz values that
> @@ -12775,10 +12786,11 @@ static void kvm_xstate_reset(struct kvm_vcpu *vcpu, bool init_event)
> /*
> * On INIT, only select XSTATE components are zeroed, most components
> * are unchanged. Currently, the only components that are zeroed and
> - * supported by KVM are MPX related.
> + * supported by KVM are MPX and CET related.
> */
> xfeatures_mask = (kvm_caps.supported_xcr0 | kvm_caps.supported_xss) &
> - (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR);
> + (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR |
> + XFEATURE_MASK_CET_ALL);
> if (!xfeatures_mask)
> return;
>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 26/51] KVM: x86: Disable support for Shadow Stacks if TDP is disabled
2025-09-19 22:32 ` [PATCH v16 26/51] KVM: x86: Disable support for Shadow Stacks if TDP is disabled Sean Christopherson
2025-09-22 7:45 ` Binbin Wu
@ 2025-09-23 14:56 ` Xiaoyao Li
1 sibling, 0 replies; 114+ messages in thread
From: Xiaoyao Li @ 2025-09-23 14:56 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Maxim Levitsky, Zhang Yi Z,
Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> Make TDP a hard requirement for Shadow Stacks, as there are no plans to
> add Shadow Stack support to the Shadow MMU. E.g. KVM hasn't been taught
> to understand the magic Writable=0,Dirty=0 combination that is required
> for Shadow Stack accesses, and so enabling Shadow Stacks when using
> shadow paging will put the guest into an infinite #PF loop (KVM thinks the
> shadow page tables have a valid mapping, hardware says otherwise).
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
> ---
> arch/x86/kvm/cpuid.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 32fde9e80c28..499c86bd457e 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -955,6 +955,14 @@ void kvm_set_cpu_caps(void)
> if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
> kvm_cpu_cap_clear(X86_FEATURE_PKU);
>
> + /*
> + * Shadow Stacks aren't implemented in the Shadow MMU. Shadow Stack
> + * accesses require "magic" Writable=0,Dirty=1 protection, which KVM
> + * doesn't know how to emulate or map.
> + */
> + if (!tdp_enabled)
> + kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
> +
> kvm_cpu_cap_init(CPUID_7_EDX,
> F(AVX512_4VNNIW),
> F(AVX512_4FMAPS),
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 28/51] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
2025-09-19 22:32 ` [PATCH v16 28/51] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Sean Christopherson
2025-09-22 8:06 ` Binbin Wu
@ 2025-09-23 14:57 ` Xiaoyao Li
1 sibling, 0 replies; 114+ messages in thread
From: Xiaoyao Li @ 2025-09-23 14:57 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Maxim Levitsky, Zhang Yi Z,
Xin Li
On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> From: Yang Weijiang <weijiang.yang@intel.com>
>
> Add support for the LOAD_CET_STATE VM-Enter and VM-Exit controls, the
> CET XFEATURE bits in XSS, and advertise support for IBT and SHSTK to
> userspace. Explicitly clear IBT and SHSTK onn SVM, as additional work is
> needed to enable CET on SVM, e.g. to context switch S_CET and other state.
>
> Disable KVM CET feature if unrestricted_guest is unsupported/disabled as
> KVM does not support emulating CET, as running without Unrestricted Guest
> can result in KVM emulating large swaths of guest code. While it's highly
> unlikely any guest will trigger emulation while also utilizing IBT or
> SHSTK, there's zero reason to allow CET without Unrestricted Guest as that
> combination should only be possible when explicitly disabling
> unrestricted_guest for testing purposes.
>
> Disable CET if VMX_BASIC[bit56] == 0, i.e. if hardware strictly enforces
> the presence of an Error Code based on exception vector, as attempting to
> inject a #CP with an Error Code (#CP architecturally has an Error Code)
> will fail due to the #CP vector historically not having an Error Code.
>
> Clear S_CET and SSP-related VMCS on "reset" to emulate the architectural
> of CET MSRs and SSP being reset to 0 after RESET, power-up and INIT. Note,
> KVM already clears guest CET state that is managed via XSTATE in
> kvm_xstate_reset().
>
> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com>
> Signed-off-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: Mathias Krause <minipli@grsecurity.net>
> Tested-by: John Allen <john.allen@amd.com>
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> [sean: move some bits to separate patches, massage changelog]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 27/51] KVM: x86: Disable support for IBT and SHSTK if allow_smaller_maxphyaddr is true
2025-09-23 14:44 ` Xiaoyao Li
@ 2025-09-23 15:04 ` Sean Christopherson
0 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-23 15:04 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Binbin Wu, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Tue, Sep 23, 2025, Xiaoyao Li wrote:
> On 9/20/2025 6:32 AM, Sean Christopherson wrote:
> > Make IBT and SHSTK virtualization mutually exclusive with "officially"
> > supporting setups with guest.MAXPHYADDR < host.MAXPHYADDR, i.e. if the
> > allow_smaller_maxphyaddr module param is set. Running a guest with a
> > smaller MAXPHYADDR requires intercepting #PF, and can also trigger
> > emulation of arbitrary instructions. Intercepting and reacting to #PFs
> > doesn't play nice with SHSTK, as KVM's MMU hasn't been taught to handle
> > Shadow Stack accesses, and emulating arbitrary instructions doesn't play
> > nice with IBT or SHSTK, as KVM's emulator doesn't handle the various side
> > effects, e.g. doesn't enforce end-branch markers or model Shadow Stack
> > updates.
> >
> > Note, hiding IBT and SHSTK based solely on allow_smaller_maxphyaddr is
> > overkill, as allow_smaller_maxphyaddr is only problematic if the guest is
> > actually configured to have a smaller MAXPHYADDR. However, KVM's ABI
> > doesn't provide a way to express that IBT and SHSTK may break if enabled
> > in conjunction with guest.MAXPHYADDR < host.MAXPHYADDR. I.e. the
> > alternative is to do nothing in KVM and instead update documentation and
> > hope KVM users are thorough readers.
>
> KVM_SET_CPUID* can return error to userspace. So KVM can return -EINVAL when
> userspace sets a smaller maxphyaddr with SHSTK/IBT enabled.
Generally speaking, I don't want to police userspace's vCPU model. For
allow_smaller_maxphyaddr in particular, I want to actively discourage its use.
The entire concept is inherently flawed, e.g. only works for a relative narrow
use case.
And IIRC, Sierra Forest and future Atom-based server CPUs will be straight up
incompatible with allow_smaller_maxphyaddr due to them setting accessed/dirty
bits before generating the EPT Violation, which is what killed allow_smaller_maxphyaddr
with NPT.
I.e. allow_smaller_maxphyaddr is doomed, and I want to help it die. If someone
really, really wants to enable CET on hosts with allow_smaller_maxphyaddr=true,
then they can send patches and we can sort out how to communicate the various
incompatibilities to userspace.
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 18/51] KVM: x86: Don't emulate instructions affected by CET features
2025-09-23 14:12 ` Xiaoyao Li
@ 2025-09-23 16:15 ` Sean Christopherson
0 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-23 16:15 UTC (permalink / raw)
To: Xiaoyao Li
Cc: Chao Gao, Paolo Bonzini, kvm, linux-kernel, Tom Lendacky,
Mathias Krause, John Allen, Rick Edgecombe, Binbin Wu,
Maxim Levitsky, Zhang Yi Z, Xin Li
On Tue, Sep 23, 2025, Xiaoyao Li wrote:
> On 9/23/2025 4:04 AM, Sean Christopherson wrote:
> Two nits besides,
> > Link:https://lore.kernel.org/r/20250919223258.1604852-19-seanjc@google.com
> > Signed-off-by: Sean Christopherson<seanjc@google.com>
> > ---
> > arch/x86/kvm/emulate.c | 117 ++++++++++++++++++++++++++++++++++++-----
> > 1 file changed, 103 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> > index 23929151a5b8..a7683dc18405 100644
> > --- a/arch/x86/kvm/emulate.c
> > +++ b/arch/x86/kvm/emulate.c
> > @@ -178,6 +178,7 @@
> > #define IncSP ((u64)1 << 54) /* SP is incremented before ModRM calc */
> > #define TwoMemOp ((u64)1 << 55) /* Instruction has two memory operand */
> > #define IsBranch ((u64)1 << 56) /* Instruction is considered a branch. */
> > +#define ShadowStack ((u64)1 << 57) /* Instruction affects Shadow Stacks. */
> > #define DstXacc (DstAccLo | SrcAccHi | SrcWrite)
> > @@ -4068,9 +4069,9 @@ static const struct opcode group4[] = {
> > static const struct opcode group5[] = {
> > F(DstMem | SrcNone | Lock, em_inc),
> > F(DstMem | SrcNone | Lock, em_dec),
> > - I(SrcMem | NearBranch | IsBranch, em_call_near_abs),
> > - I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far),
> > - I(SrcMem | NearBranch | IsBranch, em_jmp_abs),
> > + I(SrcMem | NearBranch | IsBranch | ShadowStack, em_call_near_abs),
> > + I(SrcMemFAddr | ImplicitOps | IsBranch | ShadowStack, em_call_far),
> > + I(SrcMem | NearBranch | IsBranch, em_jmp_abs),
>
> The change of this line is unexpected, since it only changes the indentation
> of 'em_jmp_abs'
> > static unsigned imm_size(struct x86_emulate_ctxt *ctxt)
> > {
> > unsigned size;
> > @@ -4943,6 +4998,40 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
> > ctxt->execute = opcode.u.execute;
> > + /*
> > + * Reject emulation if KVM might need to emulate shadow stack updates
> > + * and/or indirect branch tracking enforcement, which the emulator
> > + * doesn't support.
> > + */
> > + if ((is_ibt_instruction(ctxt) || is_shstk_instruction(ctxt)) &&
> > + ctxt->ops->get_cr(ctxt, 4) & X86_CR4_CET) {
> > + u64 u_cet = 0, s_cet = 0;
> > +
> > + /*
> > + * Check both User and Supervisor on far transfers as inter-
> > + * privilege level transfers are impacted by CET at the target
> > + * privilege level, and that is not known at this time. The
> > + * the expectation is that the guest will not require emulation
>
> Dobule 'the'
Squashed fixes for both, thanks!
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 29/51] KVM: VMX: Configure nested capabilities after CPU capabilities
2025-09-23 2:37 ` Chao Gao
@ 2025-09-23 16:24 ` Sean Christopherson
2025-09-23 16:49 ` Xin Li
0 siblings, 1 reply; 114+ messages in thread
From: Sean Christopherson @ 2025-09-23 16:24 UTC (permalink / raw)
To: Chao Gao
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Tue, Sep 23, 2025, Chao Gao wrote:
> On Fri, Sep 19, 2025 at 03:32:36PM -0700, Sean Christopherson wrote:
> >Swap the order between configuring nested VMX capabilities and base CPU
> >capabilities, so that nested VMX support can be conditioned on core KVM
> >support, e.g. to allow conditioning support for LOAD_CET_STATE on the
> >presence of IBT or SHSTK. Because the sanity checks on nested VMX config
> >performed by vmx_check_processor_compat() run _after_ vmx_hardware_setup(),
> >any use of kvm_cpu_cap_has() when configuring nested VMX support will lead
> >to failures in vmx_check_processor_compat().
> >
> >While swapping the order of two (or more) configuration flows can lead to
> >a game of whack-a-mole, in this case nested support inarguably should be
> >done after base support. KVM should never condition base support on nested
> >support, because nested support is fully optional, while obviously it's
> >desirable to condition nested support on base support. And there's zero
> >evidence the current ordering was intentional, e.g. commit 66a6950f9995
> >("KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking")
> >likely placed the call to kvm_set_cpu_caps() after nested setup because it
> >looked pretty.
> >
> >Signed-off-by: Sean Christopherson <seanjc@google.com>
>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
>
> I had a feeling I'd seen this patch before :). After some searching in lore, I
> tracked it down:
> https://lore.kernel.org/kvm/20241001050110.3643764-22-xin@zytor.com/
Gah, sorry Xin :-/
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 34/51] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state
2025-09-23 2:43 ` Chao Gao
@ 2025-09-23 16:28 ` Sean Christopherson
0 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-23 16:28 UTC (permalink / raw)
To: Chao Gao
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Tue, Sep 23, 2025, Chao Gao wrote:
> > /* We support free control of debug control saving. */
> > msrs->exit_ctls_low &= ~VM_EXIT_SAVE_DEBUG_CONTROLS;
> > }
> >@@ -7200,11 +7204,16 @@ static void nested_vmx_setup_entry_ctls(struct vmcs_config *vmcs_conf,
> > #ifdef CONFIG_X86_64
> > VM_ENTRY_IA32E_MODE |
> > #endif
> >- VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS;
> >+ VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
> >+ VM_ENTRY_LOAD_CET_STATE;
> > msrs->entry_ctls_high |=
> > (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER |
> > VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
> >
> >+ if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
> >+ !kvm_cpu_cap_has(X86_FEATURE_IBT))
> >+ msrs->exit_ctls_high &= ~VM_ENTRY_LOAD_CET_STATE;
>
> one copy-paste error here. s/exit_ctls_high/entry_ctls_high/
Thank you. I distinctly remember _trying_ to be extra careful. *sigh*
Fixup squashed.
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 29/51] KVM: VMX: Configure nested capabilities after CPU capabilities
2025-09-23 16:24 ` Sean Christopherson
@ 2025-09-23 16:49 ` Xin Li
0 siblings, 0 replies; 114+ messages in thread
From: Xin Li @ 2025-09-23 16:49 UTC (permalink / raw)
To: Sean Christopherson, Chao Gao
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky
On 9/23/2025 9:24 AM, Sean Christopherson wrote:
>> I had a feeling I'd seen this patch before 🙂. After some searching in lore, I
>> tracked it down:
>> https://lore.kernel.org/kvm/20241001050110.3643764-22-xin@zytor.com/
> Gah, sorry Xin :-/
Oh, Chao really has a good memory.
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 45/51] KVM: selftests: Add an MSR test to exercise guest/host and read/write
2025-09-23 8:03 ` Chao Gao
@ 2025-09-23 16:51 ` Sean Christopherson
0 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-23 16:51 UTC (permalink / raw)
To: Chao Gao
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Tue, Sep 23, 2025, Chao Gao wrote:
> On Fri, Sep 19, 2025 at 03:32:52PM -0700, Sean Christopherson wrote:
> >+ /*
> >+ * TSC_AUX is supported if RDTSCP *or* RDPID is supported. Add
> >+ * entries for each features so that TSC_AUX doesn't exists for
> >+ * the "unsupported" vCPU, and obviously to test both cases.
> >+ */
> >+ MSR_TEST2(MSR_TSC_AUX, 0x12345678, canonical_val, RDTSCP, RDPID),
> >+ MSR_TEST2(MSR_TSC_AUX, 0x12345678, canonical_val, RDPID, RDTSCP),
>
> At first glance, it's unclear to me why canonical_val is invalid for
> MSR_TSC_AUX, especially since it is valid for a few other MSRs in this
> test. Should we add a note to the above comment? e.g.,
>
> canonical_val is invalid for MSR_TSC_AUX because its high 32 bits must be 0.
Yeah, I was being lazy. To-be-tested, but I'll squash this:
diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
index 9285cf51ef75..345a39030a0a 100644
--- a/tools/testing/selftests/kvm/x86/msrs_test.c
+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
@@ -48,6 +48,13 @@ struct kvm_msr {
*/
static const u64 canonical_val = 0x123456789000ull;
+/*
+ * Arbitrary value with bits set in every byte, but not all bits set. This is
+ * also a non-canonical value, but that's coincidental (any 64-bit value with
+ * an alternating 0s/1s pattern will be non-canonical).
+ */
+static const u64 u64_val = 0xaaaa5555aaaa5555ull;
+
#define MSR_TEST_CANONICAL(msr, feat) \
__MSR_TEST(msr, #msr, canonical_val, NONCANONICAL, 0, feat)
@@ -247,8 +254,8 @@ static void test_msrs(void)
* entries for each features so that TSC_AUX doesn't exists for
* the "unsupported" vCPU, and obviously to test both cases.
*/
- MSR_TEST2(MSR_TSC_AUX, 0x12345678, canonical_val, RDTSCP, RDPID),
- MSR_TEST2(MSR_TSC_AUX, 0x12345678, canonical_val, RDPID, RDTSCP),
+ MSR_TEST2(MSR_TSC_AUX, 0x12345678, u64_val, RDTSCP, RDPID),
+ MSR_TEST2(MSR_TSC_AUX, 0x12345678, u64_val, RDPID, RDTSCP),
MSR_TEST(MSR_IA32_SYSENTER_CS, 0x1234, 0, NONE),
/*
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v16 49/51] KVM: selftests: Add coverate for KVM-defined registers in MSRs test
2025-09-23 6:31 ` Chao Gao
@ 2025-09-23 16:59 ` Sean Christopherson
0 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-23 16:59 UTC (permalink / raw)
To: Chao Gao
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Tue, Sep 23, 2025, Chao Gao wrote:
> On Fri, Sep 19, 2025 at 03:32:56PM -0700, Sean Christopherson wrote:
> >Add test coverage for the KVM-defined GUEST_SSP "register" in the MSRs
> >test. While _KVM's_ goal is to not tie the uAPI of KVM-defined registers
> >to any particular internal implementation, i.e. to not commit in uAPI to
> >handling GUEST_SSP as an MSR, treating GUEST_SSP as an MSR for testing
> >purposes is a-ok and is a naturally fit given the semantics of SSP.
> >
> >Signed-off-by: Sean Christopherson <seanjc@google.com>
>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
>
> <snip>
>
> >+static bool vcpu_has_reg(struct kvm_vcpu *vcpu, u64 reg)
> >+{
> >+ struct {
> >+ struct kvm_reg_list list;
> >+ u64 regs[KVM_X86_MAX_NR_REGS];
> >+ } regs = {};
> >+ int r, i;
> >+
> >+ /*
> >+ * If KVM_GET_REG_LIST succeeds with n=0, i.e. there are no supported
> >+ * regs, then the vCPU obviously doesn't support the reg.
> >+ */
> >+ r = __vcpu_ioctl(vcpu, KVM_GET_REG_LIST, ®s.list.n);
> ^^^^^^^^^^^^
> it would be more clear to use ®.list here.
Fixed both. No idea why I wrote it that way.
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 50/51] KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported
2025-09-23 6:46 ` Chao Gao
@ 2025-09-23 17:02 ` Sean Christopherson
0 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-23 17:02 UTC (permalink / raw)
To: Chao Gao
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Tue, Sep 23, 2025, Chao Gao wrote:
> On Fri, Sep 19, 2025 at 03:32:57PM -0700, Sean Christopherson wrote:
> >Add a check in the MSRs test to verify that KVM's reported support for
> >MSRs with feature bits is consistent between KVM's MSR save/restore lists
> >and KVM's supported CPUID.
> >
>
> >To deal with Intel's wonderful decision to bundle IBT and SHSTK under CET,
> >track the "second" feature to avoid false failures when running on a CPU
> >with only one of IBT or SHSTK.
>
> is this paragraph related to this patch? the tracking is done in a previous
> patch instead of this patch. So maybe just drop this paragraph.
>
> >
> >Signed-off-by: Sean Christopherson <seanjc@google.com>
> >---
> > tools/testing/selftests/kvm/x86/msrs_test.c | 22 ++++++++++++++++++++-
> > 1 file changed, 21 insertions(+), 1 deletion(-)
> >
> >diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
> >index 7c6d846e42dd..91dc66bfdac2 100644
> >--- a/tools/testing/selftests/kvm/x86/msrs_test.c
> >+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
> >@@ -437,12 +437,32 @@ static void test_msrs(void)
> > }
> >
> > for (idx = 0; idx < ARRAY_SIZE(__msrs); idx++) {
> >- if (msrs[idx].is_kvm_defined) {
> >+ struct kvm_msr *msr = &msrs[idx];
> >+
> >+ if (msr->is_kvm_defined) {
> > for (i = 0; i < NR_VCPUS; i++)
> > host_test_kvm_reg(vcpus[i]);
> > continue;
> > }
> >
> >+ /*
> >+ * Verify KVM_GET_SUPPORTED_CPUID and KVM_GET_MSR_INDEX_LIST
> >+ * are consistent with respect to MSRs whose existence is
> >+ * enumerated via CPUID. Note, using LM as a dummy feature
> >+ * is a-ok here as well, as all MSRs that abuse LM should be
> >+ * unconditionally reported in the save/restore list (and
>
> I am not sure why LM is mentioned here. Is it a leftover from one of your
> previous attempts?
Yeah, at one point I was using LM as the NONE feature. I'll delete the entire
sentence.
>
> >+ * selftests are 64-bit only). Note #2, skip the check for
> >+ * FS/GS.base MSRs, as they aren't reported in the save/restore
> >+ * list since their state is managed via SREGS.
> >+ */
> >+ TEST_ASSERT(msr->index == MSR_FS_BASE || msr->index == MSR_GS_BASE ||
> >+ kvm_msr_is_in_save_restore_list(msr->index) ==
> >+ (kvm_cpu_has(msr->feature) || kvm_cpu_has(msr->feature2)),
> >+ "%s %s save/restore list, but %s according to CPUID", msr->name,
>
> ^ an "in" is missing here.
Heh, I had added this in a local version when debugging, but forgot to push the
fix. Added now.
diff --git a/tools/testing/selftests/kvm/x86/msrs_test.c b/tools/testing/selftests/kvm/x86/msrs_test.c
index c2ab75e5d9ea..40d918aedce6 100644
--- a/tools/testing/selftests/kvm/x86/msrs_test.c
+++ b/tools/testing/selftests/kvm/x86/msrs_test.c
@@ -455,17 +455,14 @@ static void test_msrs(void)
/*
* Verify KVM_GET_SUPPORTED_CPUID and KVM_GET_MSR_INDEX_LIST
* are consistent with respect to MSRs whose existence is
- * enumerated via CPUID. Note, using LM as a dummy feature
- * is a-ok here as well, as all MSRs that abuse LM should be
- * unconditionally reported in the save/restore list (and
- * selftests are 64-bit only). Note #2, skip the check for
- * FS/GS.base MSRs, as they aren't reported in the save/restore
- * list since their state is managed via SREGS.
+ * enumerated via CPUID. Skip the check for FS/GS.base MSRs,
+ * as they aren't reported in the save/restore list since their
+ * state is managed via SREGS.
*/
TEST_ASSERT(msr->index == MSR_FS_BASE || msr->index == MSR_GS_BASE ||
kvm_msr_is_in_save_restore_list(msr->index) ==
(kvm_cpu_has(msr->feature) || kvm_cpu_has(msr->feature2)),
- "%s %s save/restore list, but %s according to CPUID", msr->name,
+ "%s %s in save/restore list, but %s according to CPUID", msr->name,
kvm_msr_is_in_save_restore_list(msr->index) ? "is" : "isn't",
(kvm_cpu_has(msr->feature) || kvm_cpu_has(msr->feature2)) ?
"supported" : "unsupported");
^ permalink raw reply related [flat|nested] 114+ messages in thread
* Re: [PATCH v16 00/51] KVM: x86: Super Mega CET
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (50 preceding siblings ...)
2025-09-19 22:32 ` [PATCH v16 51/51] KVM: VMX: Make CR4.CET a guest owned bit Sean Christopherson
@ 2025-09-24 14:32 ` Chao Gao
2025-09-24 18:07 ` Sean Christopherson
52 siblings, 0 replies; 114+ messages in thread
From: Chao Gao @ 2025-09-24 14:32 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Fri, Sep 19, 2025 at 03:32:07PM -0700, Sean Christopherson wrote:
>As the subject suggests, this series continues to grow, as there an absolutely
>stupid number of edge cases and interactions.
>
>There are (a lot) more changes between v15 and v16 than I was hoping for, but
>there all fairly "minor" in the sense that it's things like disabling SHSTK
>when using the shadow MMU. I.e. it's mostly "configuration" fixes, and very
>few logical changes (outside of msrs_test.c, which has non-trivial changes due
>to ignore_msrs, argh).
>
>So, my plan is to still land this in 6.18. I'm going to push it to -next
>today to get as much testing as possible, but in a dedicated branch so that I
>can fixup as needed (e.g. even if it's just for reviews). I'll freeze the
>hashes sometime next week.
The CET branch in your kvm-x86 repo passed all my tests on an Emerald Rapids
system, including:
1. kselftest
2. KVM-Unit-Test for both i386 and x86_64
3. glibc unit tests in the guest
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 00/51] KVM: x86: Super Mega CET
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
` (51 preceding siblings ...)
2025-09-24 14:32 ` [PATCH v16 00/51] KVM: x86: Super Mega CET Chao Gao
@ 2025-09-24 18:07 ` Sean Christopherson
52 siblings, 0 replies; 114+ messages in thread
From: Sean Christopherson @ 2025-09-24 18:07 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Tom Lendacky, Mathias Krause, John Allen,
Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li, Maxim Levitsky,
Zhang Yi Z, Xin Li
On Fri, 19 Sep 2025 15:32:07 -0700, Sean Christopherson wrote:
> As the subject suggests, this series continues to grow, as there an absolutely
> stupid number of edge cases and interactions.
>
> There are (a lot) more changes between v15 and v16 than I was hoping for, but
> there all fairly "minor" in the sense that it's things like disabling SHSTK
> when using the shadow MMU. I.e. it's mostly "configuration" fixes, and very
> few logical changes (outside of msrs_test.c, which has non-trivial changes due
> to ignore_msrs, argh).
>
> [...]
Unless someone finds a truly egregious bug, the hashes are now frozen. Please
post any fixups as standalone patches based on kvm-x86/next, and I'll apply on
top as appropriate.
Thanks everyone!
Applied 1-3 to kvm-x86 svm:
[01/51] KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code()
https://github.com/kvm-x86/linux/commit/e0ff302b79c5
[02/51] KVM: SEV: Read save fields from GHCB exactly once
https://github.com/kvm-x86/linux/commit/bd5f500d2317
[03/51] KVM: SEV: Validate XCR0 provided by guest in GHCB
https://github.com/kvm-x86/linux/commit/4135a9a8ccba
The ex_str() selftest patch to kvm-x86 selftests:
[44/51] KVM: selftests: Add ex_str() to print human friendly name of exception vectors
https://github.com/kvm-x86/linux/commit/df1f294013da
And the rest to kvm-x86 cet (including patch "26.5"):
[04/51] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
https://github.com/kvm-x86/linux/commit/06f2969c6a12
[05/51] KVM: x86: Report XSS as to-be-saved if there are supported features
https://github.com/kvm-x86/linux/commit/c0a5f2989122
[06/51] KVM: x86: Check XSS validity against guest CPUIDs
https://github.com/kvm-x86/linux/commit/338543cbe033
[07/51] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS
https://github.com/kvm-x86/linux/commit/9622e116d0d2
[08/51] KVM: x86: Initialize kvm_caps.supported_xss
https://github.com/kvm-x86/linux/commit/779ed05511f2
[09/51] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
https://github.com/kvm-x86/linux/commit/e44eb58334bb
[10/51] KVM: x86: Add fault checks for guest CR4.CET setting
https://github.com/kvm-x86/linux/commit/586ef9dcbb28
[11/51] KVM: x86: Report KVM supported CET MSRs as to-be-saved
https://github.com/kvm-x86/linux/commit/6a11c860d8a4
[12/51] KVM: VMX: Introduce CET VMCS fields and control bits
https://github.com/kvm-x86/linux/commit/d6c387fc396b
[13/51] KVM: x86: Enable guest SSP read/write interface with new uAPIs
https://github.com/kvm-x86/linux/commit/9d6812d41535
[14/51] KVM: VMX: Emulate read and write to CET MSRs
https://github.com/kvm-x86/linux/commit/8b59d0275c96
[15/51] KVM: x86: Save and reload SSP to/from SMRAM
https://github.com/kvm-x86/linux/commit/1a61bd0d126a
[16/51] KVM: VMX: Set up interception for CET MSRs
https://github.com/kvm-x86/linux/commit/25f3840483e6
[17/51] KVM: VMX: Set host constant supervisor states to VMCS fields
https://github.com/kvm-x86/linux/commit/584ba3ffb984
[18/51] KVM: x86: Don't emulate instructions affected by CET features
https://github.com/kvm-x86/linux/commit/57c3db7e2e26
[19/51] KVM: x86: Don't emulate task switches when IBT or SHSTK is enabled
https://github.com/kvm-x86/linux/commit/82c0ec028258
[20/51] KVM: x86: Emulate SSP[63:32]!=0 #GP(0) for FAR JMP to 32-bit mode
https://github.com/kvm-x86/linux/commit/d4c03f63957c
[21/51] KVM: x86/mmu: WARN on attempt to check permissions for Shadow Stack #PF
https://github.com/kvm-x86/linux/commit/296599346c67
[22/51] KVM: x86/mmu: Pretty print PK, SS, and SGX flags in MMU tracepoints
https://github.com/kvm-x86/linux/commit/843af0f2e461
[23/51] KVM: x86: Allow setting CR4.CET if IBT or SHSTK is supported
https://github.com/kvm-x86/linux/commit/b3744c59ebc5
[24/51] KVM: nVMX: Always forward XSAVES/XRSTORS exits from L2 to L1
https://github.com/kvm-x86/linux/commit/19e6e083f3f9
[25/51] KVM: x86: Add XSS support for CET_KERNEL and CET_USER
https://github.com/kvm-x86/linux/commit/69cc3e886582
[26/51] KVM: x86: Disable support for Shadow Stacks if TDP is disabled
https://github.com/kvm-x86/linux/commit/1f6f68fcfe43
[26.5/51] KVM: x86: Initialize allow_smaller_maxphyaddr earlier in setup
https://github.com/kvm-x86/linux/commit/f705de12a22c
[27/51] KVM: x86: Disable support for IBT and SHSTK if allow_smaller_maxphyaddr is true
https://github.com/kvm-x86/linux/commit/343acdd158a5
[28/51] KVM: x86: Enable CET virtualization for VMX and advertise to userspace
https://github.com/kvm-x86/linux/commit/e140467bbdaf
[29/51] KVM: VMX: Configure nested capabilities after CPU capabilities
https://github.com/kvm-x86/linux/commit/f7336d47be53
[30/51] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2
https://github.com/kvm-x86/linux/commit/033cc166f029
[31/51] KVM: nVMX: Prepare for enabling CET support for nested guest
https://github.com/kvm-x86/linux/commit/625884996bff
[32/51] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET
https://github.com/kvm-x86/linux/commit/8060b2bd2dd0
[33/51] KVM: nVMX: Add consistency checks for CET states
https://github.com/kvm-x86/linux/commit/62f7533a6b3a
[34/51] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state
https://github.com/kvm-x86/linux/commit/42ae6448531b
[35/51] KVM: SVM: Emulate reads and writes to shadow stack MSRs
https://github.com/kvm-x86/linux/commit/48b2ec0d540c
[36/51] KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
https://github.com/kvm-x86/linux/commit/c5ba49458513
[37/51] KVM: SVM: Update dump_vmcb with shadow stack save area additions
https://github.com/kvm-x86/linux/commit/c7586aa3bed4
[38/51] KVM: SVM: Pass through shadow stack MSRs as appropriate
https://github.com/kvm-x86/linux/commit/38c46bdbf998
[39/51] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid
https://github.com/kvm-x86/linux/commit/b5fa221f7b08
[40/51] KVM: SVM: Enable shadow stack virtualization for SVM
https://github.com/kvm-x86/linux/commit/8db428fd5229
[41/51] KVM: x86: Add human friendly formatting for #XM, and #VE
https://github.com/kvm-x86/linux/commit/d37cc4819a48
[42/51] KVM: x86: Define Control Protection Exception (#CP) vector
https://github.com/kvm-x86/linux/commit/f2f5519aa4e3
[43/51] KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors
https://github.com/kvm-x86/linux/commit/fddd07626baa
[45/51] KVM: selftests: Add an MSR test to exercise guest/host and read/write
https://github.com/kvm-x86/linux/commit/9c38ddb3df94
[46/51] KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test
https://github.com/kvm-x86/linux/commit/27c41353064f
[47/51] KVM: selftests: Extend MSRs test to validate vCPUs without supported features
https://github.com/kvm-x86/linux/commit/a8b9cca99cf4
[48/51] KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test
https://github.com/kvm-x86/linux/commit/80c2b6d8e7bb
[49/51] KVM: selftests: Add coverage for KVM-defined registers in MSRs test
https://github.com/kvm-x86/linux/commit/3469fd203bac
[50/51] KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported
https://github.com/kvm-x86/linux/commit/947ab90c9198
[51/51] KVM: VMX: Make CR4.CET a guest owned bit
https://github.com/kvm-x86/linux/commit/d292035fb5d2
--
https://github.com/kvm-x86/linux/tree/next
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 36/51] KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
2025-09-19 22:32 ` [PATCH v16 36/51] KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02 Sean Christopherson
@ 2025-10-28 22:23 ` Yosry Ahmed
2025-12-09 0:48 ` Yosry Ahmed
0 siblings, 1 reply; 114+ messages in thread
From: Yosry Ahmed @ 2025-10-28 22:23 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li,
Maxim Levitsky, Zhang Yi Z, Xin Li
On Fri, Sep 19, 2025 at 03:32:43PM -0700, Sean Christopherson wrote:
> Transfer the three CET Shadow Stack VMCB fields (S_CET, ISST_ADDR, and
> SSP) on VMRUN, #VMEXIT, and loading nested state (saving nested state
> simply copies the entire save area). SVM doesn't provide a way to
> disallow L1 from enabling Shadow Stacks for L2, i.e. KVM *must* provide
> nested support before advertising SHSTK to userspace.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/svm/nested.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 826473f2d7c7..a6443feab252 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -636,6 +636,14 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
> vmcb_mark_dirty(vmcb02, VMCB_DT);
> }
>
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
> + (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_CET)))) {
> + vmcb02->save.s_cet = vmcb12->save.s_cet;
> + vmcb02->save.isst_addr = vmcb12->save.isst_addr;
> + vmcb02->save.ssp = vmcb12->save.ssp;
> + vmcb_mark_dirty(vmcb02, VMCB_CET);
> + }
> +
According to the APM, there are some consistency checks that should be
done on CET related fields in the VMCB12. Specifically from
"Canonicalization and Consistency Checks. " in 15.5.1 in the APM Volume
2 (24593—Rev. 3.42—March 2024):
• Any reserved bit is set in S_CET
• CR4.CET=1 when CR0.WP=0
• CR4.CET=1 and U_CET.SS=1 when EFLAGS.VM=1
• Any reserved bit set in U_CET (SEV-ES only):
- VMRUN results in VMEXIT(INVALID)
- VMEXIT forces reserved bits to 0
Most consistency checks are done in __nested_vmcb_check_save(), but it
only operates on the cached save area, which does not have everything
you need. You'll probably need to add the needed fields to the cached
save area, or move the consistency checks elsewhere.
Related to this, I am working on patches to copy everything we use from
vmcb12->save to the cache area to minimize directly accessing vmcb12
from the guest memory as much as possible. So I already intend to add
other fields to the cached save area.
There's also a couple of other missing consistency checks that I will
send patches for, which also need fields currently not in the cached
save area.
> kvm_set_rflags(vcpu, vmcb12->save.rflags | X86_EFLAGS_FIXED);
>
> svm_set_efer(vcpu, svm->nested.save.efer);
> @@ -1044,6 +1052,12 @@ void svm_copy_vmrun_state(struct vmcb_save_area *to_save,
> to_save->rsp = from_save->rsp;
> to_save->rip = from_save->rip;
> to_save->cpl = 0;
> +
> + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
> + to_save->s_cet = from_save->s_cet;
> + to_save->isst_addr = from_save->isst_addr;
> + to_save->ssp = from_save->ssp;
> + }
> }
>
> void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
> @@ -1111,6 +1125,12 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
> vmcb12->save.dr6 = svm->vcpu.arch.dr6;
> vmcb12->save.cpl = vmcb02->save.cpl;
>
> + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
> + vmcb12->save.s_cet = vmcb02->save.s_cet;
> + vmcb12->save.isst_addr = vmcb02->save.isst_addr;
> + vmcb12->save.ssp = vmcb02->save.ssp;
> + }
> +
> vmcb12->control.int_state = vmcb02->control.int_state;
> vmcb12->control.exit_code = vmcb02->control.exit_code;
> vmcb12->control.exit_code_hi = vmcb02->control.exit_code_hi;
> --
> 2.51.0.470.ga7dc726c21-goog
>
^ permalink raw reply [flat|nested] 114+ messages in thread
* Re: [PATCH v16 36/51] KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02
2025-10-28 22:23 ` Yosry Ahmed
@ 2025-12-09 0:48 ` Yosry Ahmed
0 siblings, 0 replies; 114+ messages in thread
From: Yosry Ahmed @ 2025-12-09 0:48 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Tom Lendacky, Mathias Krause,
John Allen, Rick Edgecombe, Chao Gao, Binbin Wu, Xiaoyao Li,
Maxim Levitsky, Zhang Yi Z, Xin Li
On Tue, Oct 28, 2025 at 10:23:02PM +0000, Yosry Ahmed wrote:
> On Fri, Sep 19, 2025 at 03:32:43PM -0700, Sean Christopherson wrote:
> > Transfer the three CET Shadow Stack VMCB fields (S_CET, ISST_ADDR, and
> > SSP) on VMRUN, #VMEXIT, and loading nested state (saving nested state
> > simply copies the entire save area). SVM doesn't provide a way to
> > disallow L1 from enabling Shadow Stacks for L2, i.e. KVM *must* provide
> > nested support before advertising SHSTK to userspace.
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > arch/x86/kvm/svm/nested.c | 20 ++++++++++++++++++++
> > 1 file changed, 20 insertions(+)
> >
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 826473f2d7c7..a6443feab252 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -636,6 +636,14 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
> > vmcb_mark_dirty(vmcb02, VMCB_DT);
> > }
> >
> > + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK) &&
> > + (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_CET)))) {
> > + vmcb02->save.s_cet = vmcb12->save.s_cet;
> > + vmcb02->save.isst_addr = vmcb12->save.isst_addr;
> > + vmcb02->save.ssp = vmcb12->save.ssp;
> > + vmcb_mark_dirty(vmcb02, VMCB_CET);
> > + }
> > +
>
> According to the APM, there are some consistency checks that should be
> done on CET related fields in the VMCB12. Specifically from
> "Canonicalization and Consistency Checks. " in 15.5.1 in the APM Volume
> 2 (24593—Rev. 3.42—March 2024):
>
> • Any reserved bit is set in S_CET
> • CR4.CET=1 when CR0.WP=0
> • CR4.CET=1 and U_CET.SS=1 when EFLAGS.VM=1
> • Any reserved bit set in U_CET (SEV-ES only):
> - VMRUN results in VMEXIT(INVALID)
> - VMEXIT forces reserved bits to 0
>
> Most consistency checks are done in __nested_vmcb_check_save(), but it
> only operates on the cached save area, which does not have everything
> you need. You'll probably need to add the needed fields to the cached
> save area, or move the consistency checks elsewhere.
>
> Related to this, I am working on patches to copy everything we use from
> vmcb12->save to the cache area to minimize directly accessing vmcb12
> from the guest memory as much as possible. So I already intend to add
> other fields to the cached save area.
>
> There's also a couple of other missing consistency checks that I will
> send patches for, which also need fields currently not in the cached
> save area.
I don't really care that much, but I think this fell through the cracks.
Regarding the cached save area, the series I was talking about is
already out [*], and I am preparing to send a newer version. It puts the
fields used here in the cache, so it should be straightforward to add
the consistency checks on top of it.
[*]https://lore.kernel.org/kvm/20251110222922.613224-1-yosry.ahmed@linux.dev/
>
> > kvm_set_rflags(vcpu, vmcb12->save.rflags | X86_EFLAGS_FIXED);
> >
> > svm_set_efer(vcpu, svm->nested.save.efer);
> > @@ -1044,6 +1052,12 @@ void svm_copy_vmrun_state(struct vmcb_save_area *to_save,
> > to_save->rsp = from_save->rsp;
> > to_save->rip = from_save->rip;
> > to_save->cpl = 0;
> > +
> > + if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
> > + to_save->s_cet = from_save->s_cet;
> > + to_save->isst_addr = from_save->isst_addr;
> > + to_save->ssp = from_save->ssp;
> > + }
> > }
> >
> > void svm_copy_vmloadsave_state(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
> > @@ -1111,6 +1125,12 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
> > vmcb12->save.dr6 = svm->vcpu.arch.dr6;
> > vmcb12->save.cpl = vmcb02->save.cpl;
> >
> > + if (guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK)) {
> > + vmcb12->save.s_cet = vmcb02->save.s_cet;
> > + vmcb12->save.isst_addr = vmcb02->save.isst_addr;
> > + vmcb12->save.ssp = vmcb02->save.ssp;
> > + }
> > +
> > vmcb12->control.int_state = vmcb02->control.int_state;
> > vmcb12->control.exit_code = vmcb02->control.exit_code;
> > vmcb12->control.exit_code_hi = vmcb02->control.exit_code_hi;
> > --
> > 2.51.0.470.ga7dc726c21-goog
> >
^ permalink raw reply [flat|nested] 114+ messages in thread
end of thread, other threads:[~2025-12-09 0:48 UTC | newest]
Thread overview: 114+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-19 22:32 [PATCH v16 00/51] KVM: x86: Super Mega CET Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 01/51] KVM: SEV: Rename kvm_ghcb_get_sw_exit_code() to kvm_get_cached_sw_exit_code() Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 02/51] KVM: SEV: Read save fields from GHCB exactly once Sean Christopherson
2025-09-22 21:39 ` Tom Lendacky
2025-09-19 22:32 ` [PATCH v16 03/51] KVM: SEV: Validate XCR0 provided by guest in GHCB Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 04/51] KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 05/51] KVM: x86: Report XSS as to-be-saved if there are supported features Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 06/51] KVM: x86: Check XSS validity against guest CPUIDs Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 07/51] KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 08/51] KVM: x86: Initialize kvm_caps.supported_xss Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 09/51] KVM: x86: Load guest FPU state when access XSAVE-managed MSRs Sean Christopherson
2025-09-22 2:10 ` Binbin Wu
2025-09-22 16:41 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 10/51] KVM: x86: Add fault checks for guest CR4.CET setting Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 11/51] KVM: x86: Report KVM supported CET MSRs as to-be-saved Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 12/51] KVM: VMX: Introduce CET VMCS fields and control bits Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 13/51] KVM: x86: Enable guest SSP read/write interface with new uAPIs Sean Christopherson
2025-09-22 2:58 ` Binbin Wu
2025-09-23 9:06 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 14/51] KVM: VMX: Emulate read and write to CET MSRs Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 15/51] KVM: x86: Save and reload SSP to/from SMRAM Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 16/51] KVM: VMX: Set up interception for CET MSRs Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 17/51] KVM: VMX: Set host constant supervisor states to VMCS fields Sean Christopherson
2025-09-22 3:03 ` Binbin Wu
2025-09-19 22:32 ` [PATCH v16 18/51] KVM: x86: Don't emulate instructions affected by CET features Sean Christopherson
2025-09-22 5:39 ` Binbin Wu
2025-09-22 16:47 ` Sean Christopherson
2025-09-22 10:27 ` Chao Gao
2025-09-22 20:04 ` Sean Christopherson
2025-09-23 14:12 ` Xiaoyao Li
2025-09-23 16:15 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 19/51] KVM: x86: Don't emulate task switches when IBT or SHSTK is enabled Sean Christopherson
2025-09-22 6:41 ` Binbin Wu
2025-09-22 17:23 ` Sean Christopherson
2025-09-23 14:16 ` Xiaoyao Li
2025-09-22 11:27 ` Chao Gao
2025-09-19 22:32 ` [PATCH v16 20/51] KVM: x86: Emulate SSP[63:32]!=0 #GP(0) for FAR JMP to 32-bit mode Sean Christopherson
2025-09-22 7:15 ` Binbin Wu
2025-09-23 14:29 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 21/51] KVM: x86/mmu: WARN on attempt to check permissions for Shadow Stack #PF Sean Christopherson
2025-09-22 7:17 ` Binbin Wu
2025-09-22 7:46 ` Binbin Wu
2025-09-23 14:33 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 22/51] KVM: x86/mmu: Pretty print PK, SS, and SGX flags in MMU tracepoints Sean Christopherson
2025-09-22 7:18 ` Binbin Wu
2025-09-22 16:18 ` Sean Christopherson
2025-09-23 14:46 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 23/51] KVM: x86: Allow setting CR4.CET if IBT or SHSTK is supported Sean Christopherson
2025-09-22 7:25 ` Binbin Wu
2025-09-23 14:46 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 24/51] KVM: nVMX: Always forward XSAVES/XRSTORS exits from L2 to L1 Sean Christopherson
2025-09-23 8:15 ` Chao Gao
2025-09-23 14:49 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 25/51] KVM: x86: Add XSS support for CET_KERNEL and CET_USER Sean Christopherson
2025-09-22 7:31 ` Binbin Wu
2025-09-23 14:55 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 26/51] KVM: x86: Disable support for Shadow Stacks if TDP is disabled Sean Christopherson
2025-09-22 7:45 ` Binbin Wu
2025-09-23 14:56 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 27/51] KVM: x86: Disable support for IBT and SHSTK if allow_smaller_maxphyaddr is true Sean Christopherson
2025-09-22 8:00 ` Binbin Wu
2025-09-22 18:40 ` Sean Christopherson
2025-09-23 14:44 ` Xiaoyao Li
2025-09-23 15:04 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 28/51] KVM: x86: Enable CET virtualization for VMX and advertise to userspace Sean Christopherson
2025-09-22 8:06 ` Binbin Wu
2025-09-23 14:57 ` Xiaoyao Li
2025-09-19 22:32 ` [PATCH v16 29/51] KVM: VMX: Configure nested capabilities after CPU capabilities Sean Christopherson
2025-09-23 2:37 ` Chao Gao
2025-09-23 16:24 ` Sean Christopherson
2025-09-23 16:49 ` Xin Li
2025-09-19 22:32 ` [PATCH v16 30/51] KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2 Sean Christopherson
2025-09-22 8:37 ` Binbin Wu
2025-09-19 22:32 ` [PATCH v16 31/51] KVM: nVMX: Prepare for enabling CET support for nested guest Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 32/51] KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET Sean Christopherson
2025-09-22 8:47 ` Binbin Wu
2025-09-19 22:32 ` [PATCH v16 33/51] KVM: nVMX: Add consistency checks for CET states Sean Christopherson
2025-09-22 9:23 ` Binbin Wu
2025-09-22 16:35 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 34/51] KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state Sean Christopherson
2025-09-23 2:43 ` Chao Gao
2025-09-23 16:28 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 35/51] KVM: SVM: Emulate reads and writes to shadow stack MSRs Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 36/51] KVM: nSVM: Save/load CET Shadow Stack state to/from vmcb12/vmcb02 Sean Christopherson
2025-10-28 22:23 ` Yosry Ahmed
2025-12-09 0:48 ` Yosry Ahmed
2025-09-19 22:32 ` [PATCH v16 37/51] KVM: SVM: Update dump_vmcb with shadow stack save area additions Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 38/51] KVM: SVM: Pass through shadow stack MSRs as appropriate Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 39/51] KVM: SEV: Synchronize MSR_IA32_XSS from the GHCB when it's valid Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 40/51] KVM: SVM: Enable shadow stack virtualization for SVM Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 41/51] KVM: x86: Add human friendly formatting for #XM, and #VE Sean Christopherson
2025-09-22 8:29 ` Binbin Wu
2025-09-19 22:32 ` [PATCH v16 42/51] KVM: x86: Define Control Protection Exception (#CP) vector Sean Christopherson
2025-09-22 8:29 ` Binbin Wu
2025-09-19 22:32 ` [PATCH v16 43/51] KVM: x86: Define AMD's #HV, #VC, and #SX exception vectors Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 44/51] KVM: selftests: Add ex_str() to print human friendly name of " Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 45/51] KVM: selftests: Add an MSR test to exercise guest/host and read/write Sean Christopherson
2025-09-23 8:03 ` Chao Gao
2025-09-23 16:51 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 46/51] KVM: selftests: Add support for MSR_IA32_{S,U}_CET to MSRs test Sean Christopherson
2025-09-23 7:12 ` Chao Gao
2025-09-19 22:32 ` [PATCH v16 47/51] KVM: selftests: Extend MSRs test to validate vCPUs without supported features Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 48/51] KVM: selftests: Add KVM_{G,S}ET_ONE_REG coverage to MSRs test Sean Christopherson
2025-09-23 6:52 ` Chao Gao
2025-09-19 22:32 ` [PATCH v16 49/51] KVM: selftests: Add coverate for KVM-defined registers in " Sean Christopherson
2025-09-23 6:31 ` Chao Gao
2025-09-23 16:59 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 50/51] KVM: selftests: Verify MSRs are (not) in save/restore list when (un)supported Sean Christopherson
2025-09-23 6:46 ` Chao Gao
2025-09-23 17:02 ` Sean Christopherson
2025-09-19 22:32 ` [PATCH v16 51/51] KVM: VMX: Make CR4.CET a guest owned bit Sean Christopherson
2025-09-22 8:34 ` Binbin Wu
2025-09-24 14:32 ` [PATCH v16 00/51] KVM: x86: Super Mega CET Chao Gao
2025-09-24 18:07 ` Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox