public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification
@ 2026-04-17  7:35 Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 01/27] KVM: x86: Fix emulated CPUID features being applied to wrong sub-leaf Binbin Wu
                   ` (26 more replies)
  0 siblings, 27 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Hi,

This RFC series is to allow public capture of feedback from TDX
developers before we have too much internal conversations on it and to
initiate code review of Sashiko. It is not yet intended for review by
KVM maintainers. Sean and Paolo, please feel free to ignore this version.

Originally, we had issues on TDX when a new hardware feature, which is a
host state clobbering feature, is supported by new TDX modules/platforms.
A host state clobbering feature requires KVM to save and restore the
feature's related MSR(s) on host/guest transitions; otherwise, if the
feature is used by TDs, the host state will be corrupted, leading to
unexpected behavior on the host.

Currently KVM hardcodes a deny list for unsupported host clobbering
features for TDX, i.e. HLE, RTM and WAITPKG. However, KVM can't keep a
list of bits that it may not know about (e.g. the upcoming FRED support
in TDX).

We had been working internally to propose a TDX specific solution to
solve the host state clobbering feature issue. But during a PUCK meeting,
Sean mentioned that KVM had a more permissive CPUID configuration
interface than desired and there were problems due to it in the past for
normal VMs as well. Sean suggested that KVM should introduce a more
paranoid mode to check CPUID from userspace for VMs in general, as well
as an opt-in interface for userspace. And TDX should use the
infrastructure to enforce paranoid mode non-optionally.

This RFC patch series adds a paranoid CPUID verification mode for KVM on
x86, where KVM must be explicitly aware of every CPUID feature exposed
to the guest. When the CPUID paranoid mode is opted-in by userspace or
enforced, KVM will reject any unknown or unsupported feature from
userspace. And it starts to enforce paranoid CPUID verification for TDX.

This patch series touches a lot of lines and involves many subtle CPUID
details. We may not expect reviews on these CPUID leaf specific details
yet, but feedback is welcome on the framework to build the CPUID overlays
and how paranoid CPUID verification is implemented.

The changes are only tested on Intel platforms. Compile-tested only for
SVM.

The series is organized in following parts:
===========================================
- Patch 1 ~ 2:  Cleanup patches.

- Patch 3 ~ 11: Construct CPUID overlays
  This part extends kvm_cpu_caps[] into a 2D array indexed by an "overlay"
  dimension (CPUID_OL_DEFAULT, CPUID_OL_SVM, CPUID_OL_TDX), allowing
  each overlay to maintain its own set of supported CPUID features. 
  Having separate overlays for VMX and TDX helps handle cases where
  KVM's support for certain features differs on Intel-compatible
  platforms, e.g., HLE, RTM and WAITPKG are not supported for TDX in
  KVM. There will be new host state clobbering features like this in
  the future.
  Having separate overlays for VMX and SVM helps handle cases where a
  common feature has support on one vendor but not the other. Setting
  the support in common code requires additional handling in vendor
  specific code, e.g., SVM code needs to clear IBT, BUS_LOCK_DETECT
  and MSR_IMM.
  More overlays could be added in the future if needed.

  KVM_GET_SUPPORTED_CPUID and KVM_GET_EMULATED_CPUID are also promoted
  to VM-scoped IOCTLs so that userspace can query per-VM-type CPUID
  capabilities. CPUID overlays are a KVM internal concept; the overlay is
  decided by VM type and/or platform vendor.

- Patch 12 ~ 19: Build allowed CPUID values for different overlays
  This part builds a comprehensive table of allowed CPUID values covering
  the basic, extended, Centaur, and KVM paravirt CPUID ranges.
  For each CPUID output register, the validation follows one of three
  rules:
  1. Ignored: the register is added to the ignored set and KVM skips
     validation of the userspace-provided value.
  2. Mask/value check: a new KVM-only CPUID leaf enum is defined with a
     corresponding reverse_cpuid[] entry, and an allowed mask or fixed
     value is initialized per-overlay.
  3. Zero check: for reserved registers or registers where no bits are
     supported, userspace input is checked against zero.

- Patch 20 ~ 25: Implement paranoid CPUID verification
  This part adds CPUID paranoid verification to reject userspace CPUID
  configurations that set unsupported or unknown bits when paranoid mode
  is enabled for a VM. 
  Also, it adds the opt-in interface KVM_CAP_X86_CPUID_PARANOID for
  userspace and unconditionally enforces CPUID paranoid mode for TDs.

- Patch 26 ~ 27: Remove the hardcoded filter for TDX.
  This part removes the hardcoded deny list for unsupported host
  clobbering features for TDX, and relies on the allowed mask for the TDX
  overlay to filter and check generically.

Opens:
======
- CPUID overlays VS. open-code checks for specific features in vendor
  specific callbacks.
  Open-code checks for specific features in vendor callback will have
  less code changes, however, it tightly couples normal VM feature
  enablement with TDX. If a new host-state-clobbering feature is added
  for normal VMs, the developer has to remember to update the TDX filter
  list(s). Or when a common x86 feature is added for only VMX/SVM, the
  developer has to remember to clear the bit for the other vendor.
  Relying solely on mailing list reviews to catch these omissions may be
  more error-prone than using an overlay approach.

- This patch series uses a 2D array in common KVM code to accommodate KVM
  CPUID capabilities for different overlays. This avoids adding init ops
  and runtime ops to call into vendor modules for a few reasons:
  1. kvm_ops_update() is called after ops->hardware_setup(), inside which
     the KVM CPU capabilities are built, runtime x86 ops can not be
     called. Need some workaround to allow it.
  2. These inputs to build the KVM CPU capabilities for overlays are from
     the common KVM code or via the common KVM code helpers, which make
     the callbacks in vendor module just duplication of similar tedious
     code.
  But conceptually, putting vendor-specific overlay data in the related
  vendor module is cleaner.

- This patch combines vCPU capability initialization and paranoid CPUID
  verification. It refactors the vCPU capability initialization to iterate
  over userspace CPUID entries rather than reverse_cpuid[], combining the
  paranoid check with capability setup. The purpose is to avoid iterating
  over CPUID entries twice for vCPU capability initialization and paranoid
  check separately. However, this can change the code for vCPU capability
  initialization a bit even when paranoid mode is disabled. It could be
  separated if we want to minimize the change for the non-paranoid mode.

- This patch series checks a CPUID register if part of the 32-bit range
  is reserved. I am not sure this is necessary for all cases. It could be
  simplified if we believe these reserved bits won’t cause problems
  according to the property of the CPUID register, so that they can be
  treated as ignored registers.

Binbin Wu (27):
  KVM: x86: Fix emulated CPUID features being applied to wrong sub-leaf
  KVM: x86: Reorder the features for CPUID 7
  KVM: x86: Add definitions for CPUID overlays
  KVM: x86: Extend F() and its variants for CPUID overlays
  KVM: x86: Extend kvm_cpu_cap_{set/clear}() to configure overlays
  KVM: x86: Populate TDX CPUID overlay with supported feature bits
  KVM: x86: Support KVM_GET_{SUPPORTED,EMULATED}_CPUID as VM scope
    ioctls
  KVM: x86: Thread @kvm to KVM CPU capability helpers
  KVM: x86: Use overlays of KVM CPU capabilities
  KVM: x86: Use vendor-specific overlay flags instead of F_CPUID_DEFAULT
  KVM: SVM: Drop unnecessary clears of unsupported common x86 features
  KVM: x86: Split KVM CPU cap leafs into two parts
  KVM: x86: Add a helper to initialize CPUID multi-bit fields
  KVM: x86: Add a helper to init multiple feature bits based on raw
    CPUID
  KVM: x86: Add infrastructure to track CPUID entries ignored in
    paranoid mode
  KVM: x86: Init allowed masks for basic CPUID range in paranoid mode
  KVM: x86: Init allowed masks for extended CPUID range in paranoid mode
  KVM: x86: Handle Centaur CPUID leafs in paranoid mode
  KVM: x86: Track KVM PV CPUID features for paranoid mode
  KVM: x86: Add per-VM flag to track CPUID paranoid mode
  KVM: x86: Make kvm_vcpu_after_set_cpuid() return an error code
  KVM: x86: Verify userspace CPUID inputs in paranoid mode
  KVM: x86: Account for runtime CPUID features in paranoid mode
  KVM: x86: Skip paranoid CPUID check for KVM PV leafs when base is
    relocated
  KVM: x86: Add new KVM_CAP_X86_CPUID_PARANOID
  KVM: x86: Add a helper to query the allowed CPUID mask
  KVM: TDX: Replace hardcoded CPUID filtering with the allowed mask

 Documentation/virt/kvm/api.rst  |   18 +
 arch/x86/include/asm/kvm_host.h |   75 +-
 arch/x86/kvm/cpuid.c            | 1224 +++++++++++++++++++++----------
 arch/x86/kvm/cpuid.h            |  118 ++-
 arch/x86/kvm/reverse_cpuid.h    |   82 +++
 arch/x86/kvm/svm/nested.c       |    4 +-
 arch/x86/kvm/svm/sev.c          |    6 +-
 arch/x86/kvm/svm/svm.c          |   49 +-
 arch/x86/kvm/vmx/hyperv.c       |    2 +-
 arch/x86/kvm/vmx/nested.c       |    8 +-
 arch/x86/kvm/vmx/tdx.c          |   60 +-
 arch/x86/kvm/vmx/vmx.c          |   77 +-
 arch/x86/kvm/x86.c              |   97 ++-
 arch/x86/kvm/x86.h              |    2 +-
 include/uapi/linux/kvm.h        |    1 +
 15 files changed, 1298 insertions(+), 525 deletions(-)


base-commit: 6b802031877a995456c528095c41d1948546bf45
-- 
2.46.0


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 01/27] KVM: x86: Fix emulated CPUID features being applied to wrong sub-leaf
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 02/27] KVM: x86: Reorder the features for CPUID 7 Binbin Wu
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Guard the use of cpuid_func_emulated() with a check that the CPUID
sub-leaf index is 0, as cpuid_func_emulated() unconditionally returns
emulated features for index 0 and does not account for indexed leaves.

Without the guard, when iterating over reverse_cpuid[] entries that
share the same CPUID function but have a non-zero index, e.g.
CPUID_7_1_ECX (function=7, index=1), the emulated features for index 0
are incorrectly OR'd into the wrong capability word.  For example,
RDPID (CPUID.7.0:ECX[22]) gets erroneously applied to CPUID_7_1_ECX,
which would allow userspace to set bit 22 of CPUID.7.1:ECX in the vCPU's
capabilities.

This is currently benign as the affected bits in the non-zero index
words happen to not correspond to meaningful features, but it could
cause problems as new features are defined in those positions.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e69156b54cff..25f582a8d795 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -399,15 +399,16 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		if (!entry)
 			continue;
 
-		cpuid_func_emulated(&emulated, cpuid.function, true);
-
 		/*
 		 * A vCPU has a feature if it's supported by KVM and is enabled
 		 * in guest CPUID.  Note, this includes features that are
 		 * supported by KVM but aren't advertised to userspace!
 		 */
-		vcpu->arch.cpu_caps[i] = kvm_cpu_caps[i] |
-					 cpuid_get_reg_unsafe(&emulated, cpuid.reg);
+		vcpu->arch.cpu_caps[i] = kvm_cpu_caps[i];
+		if (!cpuid.index) {
+			cpuid_func_emulated(&emulated, cpuid.function, true);
+			vcpu->arch.cpu_caps[i] |= cpuid_get_reg_unsafe(&emulated, cpuid.reg);
+		}
 		vcpu->arch.cpu_caps[i] &= cpuid_get_reg_unsafe(entry, cpuid.reg);
 	}
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 02/27] KVM: x86: Reorder the features for CPUID 7
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 01/27] KVM: x86: Fix emulated CPUID features being applied to wrong sub-leaf Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 03/27] KVM: x86: Add definitions for CPUID overlays Binbin Wu
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Reorder the feature bits in CPUID_7_ECX, CPUID_7_EDX and CPUID_7_1_EAX
to align with the hardware defined order.

Opportunistically add comments for unsupported bits for CPUID_7_ECX and
CPUID_7_EDX.

No functional change intended.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 48 +++++++++++++++++++++++++++++---------------
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 25f582a8d795..056f86121728 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -947,26 +947,34 @@ void kvm_initialize_cpu_caps(void)
 	);
 
 	kvm_cpu_cap_init(CPUID_7_ECX,
+		/* PREFETCHWT1 */
 		F(AVX512VBMI),
-		PASSTHROUGH_F(LA57),
+		F(UMIP),
 		F(PKU),
 		RUNTIME_F(OSPKE),
-		F(RDPID),
-		F(AVX512_VPOPCNTDQ),
-		F(UMIP),
+		VENDOR_F(WAITPKG),
 		F(AVX512_VBMI2),
+		X86_64_F(SHSTK),
 		F(GFNI),
 		F(VAES),
 		F(VPCLMULQDQ),
 		F(AVX512_VNNI),
 		F(AVX512_BITALG),
+		/* TME */
+		F(AVX512_VPOPCNTDQ),
+		/* Reserved */
+		PASSTHROUGH_F(LA57),
+		/* MPX_MAWAU */
+		F(RDPID),
+		/* KEY_LOCKER */
+		F(BUS_LOCK_DETECT),
 		F(CLDEMOTE),
+		/* Reserved */
 		F(MOVDIRI),
 		F(MOVDIR64B),
-		VENDOR_F(WAITPKG),
+		/* ENQCMD */
 		F(SGX_LC),
-		F(BUS_LOCK_DETECT),
-		X86_64_F(SHSTK),
+		/* PKS */
 	);
 
 	/*
@@ -985,23 +993,31 @@ void kvm_initialize_cpu_caps(void)
 		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
 
 	kvm_cpu_cap_init(CPUID_7_EDX,
+		/* Reserved, SGX_KEYS */
 		F(AVX512_4VNNIW),
 		F(AVX512_4FMAPS),
-		F(SPEC_CTRL),
-		F(SPEC_CTRL_SSBD),
-		EMULATED_F(ARCH_CAPABILITIES),
-		F(INTEL_STIBP),
-		F(MD_CLEAR),
-		F(AVX512_VP2INTERSECT),
 		F(FSRM),
+		/* UINT, Reserved, Reserved */
+		F(AVX512_VP2INTERSECT),
+		/* SRBDS_CTRL */
+		F(MD_CLEAR),
+		/* RTM_ALWAYS_ABORT, Reserved, TSX_FORCE_ABORT */
 		F(SERIALIZE),
+		/* HYBRID_CPU */
 		F(TSXLDTRK),
+		/* Reserved, PCONFIG, ARCH_LBR */
+		F(IBT),
+		/* Reserved */
+		F(AMX_BF16),
 		F(AVX512_FP16),
 		F(AMX_TILE),
 		F(AMX_INT8),
-		F(AMX_BF16),
+		F(SPEC_CTRL),
+		F(INTEL_STIBP),
 		F(FLUSH_L1D),
-		F(IBT),
+		EMULATED_F(ARCH_CAPABILITIES),
+		/* CORE_CAPABILITIES */
+		F(SPEC_CTRL_SSBD),
 	);
 
 	/*
@@ -1033,8 +1049,8 @@ void kvm_initialize_cpu_caps(void)
 		F(FZRM),
 		F(FSRS),
 		F(FSRC),
-		F(WRMSRNS),
 		X86_64_F(LKGS),
+		F(WRMSRNS),
 		F(AMX_FP16),
 		F(AVX_IFMA),
 		F(LAM),
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 03/27] KVM: x86: Add definitions for CPUID overlays
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 01/27] KVM: x86: Fix emulated CPUID features being applied to wrong sub-leaf Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 02/27] KVM: x86: Reorder the features for CPUID 7 Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 04/27] KVM: x86: Extend F() and its variants " Binbin Wu
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Introduces definitions for CPUID overlays to handle varying CPUID
requirements across different VM types on Intel and AMD platforms.

3 CPUID overlays are defined and will be used as following:
- CPUID_OL_VMX for KVM_X86_{DEFAULT_VM, SW_PROTECTED_VM} on
  intel-compatible platforms.
- CPUID_OL_TDX for KVM_X86_TDX_VM
- CPUID_OL_SVM for all VM types supported on AMD-compatible platforms.

Having separate overlays for VMX and TDX helps to handle the cases when
there are differences about KVM's support for some features, e.g., TSX,
HLE and WAITPKG are not supported for TDX in KVM. There may be new
features like this in the future.

Having separate overlays for VMX and SVM helps to handle the case when
a common feature having support for one vendor, but not for the other.
Setting the support in common code requires additional handling in
vendor specific code. E.g., SVM code needs to clear IBT, BUS_LOCK_DETECT
and MSR_IMM.

More overlays can be added in the future, e.g, AMD may want to have
a separate CPUID overlay for SNP.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 039b8e6f40ba..f41f8d3db794 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -7,6 +7,29 @@
 #include <asm/processor.h>
 #include <uapi/asm/kvm_para.h>
 
+enum kvm_cpuid_overlay {
+	CPUID_OL_VMX = 0,
+	CPUID_OL_SVM,
+	CPUID_OL_TDX,
+	NR_CPUID_OL
+};
+
+#define F_CPUID_VMX		BIT(CPUID_OL_VMX)
+#define F_CPUID_SVM		BIT(CPUID_OL_SVM)
+#define F_CPUID_TDX		BIT(CPUID_OL_TDX)
+
+static inline u8 get_cpuid_overlay(struct kvm *kvm)
+{
+	if (kvm && kvm->arch.vm_type == KVM_X86_TDX_VM)
+		return CPUID_OL_TDX;
+
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
+	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
+		return CPUID_OL_SVM;
+
+	return CPUID_OL_VMX;
+}
+
 extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
 extern bool kvm_is_configuring_cpu_caps __read_mostly;
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 04/27] KVM: x86: Extend F() and its variants for CPUID overlays
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (2 preceding siblings ...)
  2026-04-17  7:35 ` [RFC PATCH 03/27] KVM: x86: Add definitions for CPUID overlays Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 05/27] KVM: x86: Extend kvm_cpu_cap_{set/clear}() to configure overlays Binbin Wu
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Change kvm_cpu_caps[] to a 2D array and extend F() and its variants
for preparation to accommodate capabilities for different CPUID overlays.

Use F_CPUID_DEFAULT to set the capabilities for both VMX and SVM
overlays, and define the temporary CPUID_OL_DEFAULT to be the effective
overlay that is actually VMX overlay, which equals to the original
version of kvm_cpu_caps[].

It's a bit weird to use CPUID_OL_DEFAULT, which is VMX overlay, in
ALIASED_1_EDX_F() for the SVM only features, but there is no
functional change since CPUID_OL_DEFAULT is used for all VM types.

VENDOR_F() and RUNTIME_F() don't actually set any capability, keep them
as what they are.

No functional change intended.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 498 ++++++++++++++++++++++---------------------
 arch/x86/kvm/cpuid.h |  16 +-
 2 files changed, 264 insertions(+), 250 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 056f86121728..d3f3e9f0d493 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -33,7 +33,7 @@
  * Unlike "struct cpuinfo_x86.x86_capability", kvm_cpu_caps doesn't need to be
  * aligned to sizeof(unsigned long) because it's not accessed via bitops.
  */
-u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
+u32 kvm_cpu_caps[NR_CPUID_OL][NR_KVM_CPU_CAPS] __read_mostly;
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_cpu_caps);
 
 bool kvm_is_configuring_cpu_caps __read_mostly;
@@ -404,7 +404,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		 * in guest CPUID.  Note, this includes features that are
 		 * supported by KVM but aren't advertised to userspace!
 		 */
-		vcpu->arch.cpu_caps[i] = kvm_cpu_caps[i];
+		vcpu->arch.cpu_caps[i] = kvm_cpu_caps[CPUID_OL_DEFAULT][i];
 		if (!cpuid.index) {
 			cpuid_func_emulated(&emulated, cpuid.function, true);
 			vcpu->arch.cpu_caps[i] |= cpuid_get_reg_unsafe(&emulated, cpuid.reg);
@@ -713,15 +713,18 @@ do {									\
 									\
 	feature_initializers						\
 									\
-	kvm_cpu_caps[leaf] = kvm_cpu_cap_features;			\
+	/* Handle the tailing comma compiler errors */			\
+	(void)kvm_cpu_cap_features;					\
 									\
 	if (leaf < NCAPINTS)						\
-		kvm_cpu_caps[leaf] &= kernel_cpu_caps[leaf];		\
+		kvm_cpu_cap_features &= kernel_cpu_caps[leaf];		\
 									\
-	kvm_cpu_caps[leaf] |= kvm_cpu_cap_passthrough;			\
-	kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |			\
-			       kvm_cpu_cap_synthesized);		\
-	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
+	kvm_cpu_cap_features |= kvm_cpu_cap_passthrough;		\
+	kvm_cpu_cap_features &= (raw_cpuid_get(cpuid) |			\
+				 kvm_cpu_cap_synthesized);		\
+	kvm_cpu_cap_features |= kvm_cpu_cap_emulated;			\
+	for (int i = 0; i < NR_CPUID_OL; i++)				\
+		kvm_cpu_caps[i][leaf] &= kvm_cpu_cap_features;		\
 } while (0)
 
 /*
@@ -736,37 +739,43 @@ do {									\
 	BUILD_BUG_ON(__leaf != kvm_cpu_cap_init_in_progress);		\
 } while (0)
 
-#define F(name)							\
-({								\
-	KVM_VALIDATE_CPU_CAP_USAGE(name);			\
-	kvm_cpu_cap_features |= feature_bit(name);		\
+#define F(name, overlay_mask)						\
+({									\
+	u32 __leaf = __feature_leaf(X86_FEATURE_##name);		\
+									\
+	KVM_VALIDATE_CPU_CAP_USAGE(name);				\
+	kvm_cpu_cap_features |= feature_bit(name);			\
+	for (int i = 0; i < NR_CPUID_OL; i++) {				\
+		if ((overlay_mask) & BIT(i))				\
+			kvm_cpu_caps[i][__leaf] |= feature_bit(name);	\
+	}								\
 })
 
 /* Scattered Flag - For features that are scattered by cpufeatures.h. */
-#define SCATTERED_F(name)					\
+#define SCATTERED_F(name, overlay_mask)				\
 ({								\
 	BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES);	\
 	KVM_VALIDATE_CPU_CAP_USAGE(name);			\
 	if (boot_cpu_has(X86_FEATURE_##name))			\
-		F(name);					\
+		F(name, overlay_mask);				\
 })
 
 /* Features that KVM supports only on 64-bit kernels. */
-#define X86_64_F(name)						\
+#define X86_64_F(name, overlay_mask)				\
 ({								\
 	KVM_VALIDATE_CPU_CAP_USAGE(name);			\
 	if (IS_ENABLED(CONFIG_X86_64))				\
-		F(name);					\
+		F(name, overlay_mask);				\
 })
 
 /*
  * Emulated Feature - For features that KVM emulates in software irrespective
  * of host CPU/kernel support.
  */
-#define EMULATED_F(name)					\
+#define EMULATED_F(name, overlay_mask)				\
 ({								\
 	kvm_cpu_cap_emulated |= feature_bit(name);		\
-	F(name);						\
+	F(name, overlay_mask);					\
 })
 
 /*
@@ -774,13 +783,13 @@ do {									\
  * i.e. may not be present in the raw CPUID, but can still be advertised to
  * userspace.  Primarily used for mitigation related feature flags.
  */
-#define SYNTHESIZED_F(name)					\
+#define SYNTHESIZED_F(name, overlay_mask)			\
 ({								\
 	kvm_cpu_cap_synthesized |= feature_bit(name);		\
 								\
 	BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES);	\
 	if (boot_cpu_has(X86_FEATURE_##name))			\
-		F(name);					\
+		F(name, overlay_mask);				\
 })
 
 /*
@@ -789,21 +798,22 @@ do {									\
  * use the feature.  Simply force set the feature in KVM's capabilities, raw
  * CPUID support will be factored in by kvm_cpu_cap_mask().
  */
-#define PASSTHROUGH_F(name)					\
+#define PASSTHROUGH_F(name, overlay_mask)			\
 ({								\
 	kvm_cpu_cap_passthrough |= feature_bit(name);		\
-	F(name);						\
+	F(name, overlay_mask);					\
 })
 
 /*
  * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
  * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
  */
-#define ALIASED_1_EDX_F(name)							\
-({										\
-	BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);	\
-	BUILD_BUG_ON(kvm_cpu_cap_init_in_progress != CPUID_8000_0001_EDX);	\
-	kvm_cpu_cap_features |= feature_bit(name);				\
+#define ALIASED_1_EDX_F(name)								\
+({											\
+	BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);		\
+	BUILD_BUG_ON(kvm_cpu_cap_init_in_progress != CPUID_8000_0001_EDX);		\
+	kvm_cpu_cap_features |= feature_bit(name);					\
+	kvm_cpu_caps[CPUID_OL_DEFAULT][CPUID_8000_0001_EDX] |= feature_bit(name);	\
 })
 
 /*
@@ -840,12 +850,12 @@ void kvm_initialize_cpu_caps(void)
 	WARN_ON_ONCE(kvm_is_configuring_cpu_caps);
 	kvm_is_configuring_cpu_caps = true;
 
-	BUILD_BUG_ON(sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)) >
+	BUILD_BUG_ON(sizeof(kvm_cpu_caps)/NR_CPUID_OL - (NKVMCAPINTS * sizeof(**kvm_cpu_caps)) >
 		     sizeof(boot_cpu_data.x86_capability));
 
 	kvm_cpu_cap_init(CPUID_1_ECX,
-		F(XMM3),
-		F(PCLMULQDQ),
+		F(XMM3, F_CPUID_DEFAULT),
+		F(PCLMULQDQ, F_CPUID_DEFAULT),
 		VENDOR_F(DTES64),
 		/*
 		 * NOTE: MONITOR (and MWAIT) are emulated as NOP, but *not*
@@ -858,122 +868,122 @@ void kvm_initialize_cpu_caps(void)
 		VENDOR_F(VMX),
 		/* SMX, EST */
 		/* TM2 */
-		F(SSSE3),
+		F(SSSE3, F_CPUID_DEFAULT),
 		/* CNXT-ID */
 		/* Reserved */
-		F(FMA),
-		F(CX16),
+		F(FMA, F_CPUID_DEFAULT),
+		F(CX16, F_CPUID_DEFAULT),
 		/* xTPR Update */
-		F(PDCM),
-		F(PCID),
+		F(PDCM, F_CPUID_DEFAULT),
+		F(PCID, F_CPUID_DEFAULT),
 		/* Reserved, DCA */
-		F(XMM4_1),
-		F(XMM4_2),
-		EMULATED_F(X2APIC),
-		F(MOVBE),
-		F(POPCNT),
-		EMULATED_F(TSC_DEADLINE_TIMER),
-		F(AES),
-		F(XSAVE),
+		F(XMM4_1, F_CPUID_DEFAULT),
+		F(XMM4_2, F_CPUID_DEFAULT),
+		EMULATED_F(X2APIC, F_CPUID_DEFAULT),
+		F(MOVBE, F_CPUID_DEFAULT),
+		F(POPCNT, F_CPUID_DEFAULT),
+		EMULATED_F(TSC_DEADLINE_TIMER, F_CPUID_DEFAULT),
+		F(AES, F_CPUID_DEFAULT),
+		F(XSAVE, F_CPUID_DEFAULT),
 		RUNTIME_F(OSXSAVE),
-		F(AVX),
-		F(F16C),
-		F(RDRAND),
-		EMULATED_F(HYPERVISOR),
+		F(AVX, F_CPUID_DEFAULT),
+		F(F16C, F_CPUID_DEFAULT),
+		F(RDRAND, F_CPUID_DEFAULT),
+		EMULATED_F(HYPERVISOR, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_1_EDX,
-		F(FPU),
-		F(VME),
-		F(DE),
-		F(PSE),
-		F(TSC),
-		F(MSR),
-		F(PAE),
-		F(MCE),
-		F(CX8),
-		F(APIC),
+		F(FPU, F_CPUID_DEFAULT),
+		F(VME, F_CPUID_DEFAULT),
+		F(DE, F_CPUID_DEFAULT),
+		F(PSE, F_CPUID_DEFAULT),
+		F(TSC, F_CPUID_DEFAULT),
+		F(MSR, F_CPUID_DEFAULT),
+		F(PAE, F_CPUID_DEFAULT),
+		F(MCE, F_CPUID_DEFAULT),
+		F(CX8, F_CPUID_DEFAULT),
+		F(APIC, F_CPUID_DEFAULT),
 		/* Reserved */
-		F(SEP),
-		F(MTRR),
-		F(PGE),
-		F(MCA),
-		F(CMOV),
-		F(PAT),
-		F(PSE36),
+		F(SEP, F_CPUID_DEFAULT),
+		F(MTRR, F_CPUID_DEFAULT),
+		F(PGE, F_CPUID_DEFAULT),
+		F(MCA, F_CPUID_DEFAULT),
+		F(CMOV, F_CPUID_DEFAULT),
+		F(PAT, F_CPUID_DEFAULT),
+		F(PSE36, F_CPUID_DEFAULT),
 		/* PSN */
-		F(CLFLUSH),
+		F(CLFLUSH, F_CPUID_DEFAULT),
 		/* Reserved */
 		VENDOR_F(DS),
 		/* ACPI */
-		F(MMX),
-		F(FXSR),
-		F(XMM),
-		F(XMM2),
-		F(SELFSNOOP),
+		F(MMX, F_CPUID_DEFAULT),
+		F(FXSR, F_CPUID_DEFAULT),
+		F(XMM, F_CPUID_DEFAULT),
+		F(XMM2, F_CPUID_DEFAULT),
+		F(SELFSNOOP, F_CPUID_DEFAULT),
 		/* HTT, TM, Reserved, PBE */
 	);
 
 	kvm_cpu_cap_init(CPUID_7_0_EBX,
-		F(FSGSBASE),
-		EMULATED_F(TSC_ADJUST),
-		F(SGX),
-		F(BMI1),
-		F(HLE),
-		F(AVX2),
-		F(FDP_EXCPTN_ONLY),
-		F(SMEP),
-		F(BMI2),
-		F(ERMS),
-		F(INVPCID),
-		F(RTM),
-		F(ZERO_FCS_FDS),
+		F(FSGSBASE, F_CPUID_DEFAULT),
+		EMULATED_F(TSC_ADJUST, F_CPUID_DEFAULT),
+		F(SGX, F_CPUID_DEFAULT),
+		F(BMI1, F_CPUID_DEFAULT),
+		F(HLE, F_CPUID_DEFAULT),
+		F(AVX2, F_CPUID_DEFAULT),
+		F(FDP_EXCPTN_ONLY, F_CPUID_DEFAULT),
+		F(SMEP, F_CPUID_DEFAULT),
+		F(BMI2, F_CPUID_DEFAULT),
+		F(ERMS, F_CPUID_DEFAULT),
+		F(INVPCID, F_CPUID_DEFAULT),
+		F(RTM, F_CPUID_DEFAULT),
+		F(ZERO_FCS_FDS, F_CPUID_DEFAULT),
 		VENDOR_F(MPX),
-		F(AVX512F),
-		F(AVX512DQ),
-		F(RDSEED),
-		F(ADX),
-		F(SMAP),
-		F(AVX512IFMA),
-		F(CLFLUSHOPT),
-		F(CLWB),
+		F(AVX512F, F_CPUID_DEFAULT),
+		F(AVX512DQ, F_CPUID_DEFAULT),
+		F(RDSEED, F_CPUID_DEFAULT),
+		F(ADX, F_CPUID_DEFAULT),
+		F(SMAP, F_CPUID_DEFAULT),
+		F(AVX512IFMA, F_CPUID_DEFAULT),
+		F(CLFLUSHOPT, F_CPUID_DEFAULT),
+		F(CLWB, F_CPUID_DEFAULT),
 		VENDOR_F(INTEL_PT),
-		F(AVX512PF),
-		F(AVX512ER),
-		F(AVX512CD),
-		F(SHA_NI),
-		F(AVX512BW),
-		F(AVX512VL),
+		F(AVX512PF, F_CPUID_DEFAULT),
+		F(AVX512ER, F_CPUID_DEFAULT),
+		F(AVX512CD, F_CPUID_DEFAULT),
+		F(SHA_NI, F_CPUID_DEFAULT),
+		F(AVX512BW, F_CPUID_DEFAULT),
+		F(AVX512VL, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_7_ECX,
 		/* PREFETCHWT1 */
-		F(AVX512VBMI),
-		F(UMIP),
-		F(PKU),
+		F(AVX512VBMI, F_CPUID_DEFAULT),
+		F(UMIP, F_CPUID_DEFAULT),
+		F(PKU, F_CPUID_DEFAULT),
 		RUNTIME_F(OSPKE),
 		VENDOR_F(WAITPKG),
-		F(AVX512_VBMI2),
-		X86_64_F(SHSTK),
-		F(GFNI),
-		F(VAES),
-		F(VPCLMULQDQ),
-		F(AVX512_VNNI),
-		F(AVX512_BITALG),
+		F(AVX512_VBMI2, F_CPUID_DEFAULT),
+		X86_64_F(SHSTK, F_CPUID_DEFAULT),
+		F(GFNI, F_CPUID_DEFAULT),
+		F(VAES, F_CPUID_DEFAULT),
+		F(VPCLMULQDQ, F_CPUID_DEFAULT),
+		F(AVX512_VNNI, F_CPUID_DEFAULT),
+		F(AVX512_BITALG, F_CPUID_DEFAULT),
 		/* TME */
-		F(AVX512_VPOPCNTDQ),
+		F(AVX512_VPOPCNTDQ, F_CPUID_DEFAULT),
 		/* Reserved */
-		PASSTHROUGH_F(LA57),
+		PASSTHROUGH_F(LA57, F_CPUID_DEFAULT),
 		/* MPX_MAWAU */
-		F(RDPID),
+		F(RDPID, F_CPUID_DEFAULT),
 		/* KEY_LOCKER */
-		F(BUS_LOCK_DETECT),
-		F(CLDEMOTE),
+		F(BUS_LOCK_DETECT, F_CPUID_DEFAULT),
+		F(CLDEMOTE, F_CPUID_DEFAULT),
 		/* Reserved */
-		F(MOVDIRI),
-		F(MOVDIR64B),
+		F(MOVDIRI, F_CPUID_DEFAULT),
+		F(MOVDIR64B, F_CPUID_DEFAULT),
 		/* ENQCMD */
-		F(SGX_LC),
+		F(SGX_LC, F_CPUID_DEFAULT),
 		/* PKS */
 	);
 
@@ -994,30 +1004,30 @@ void kvm_initialize_cpu_caps(void)
 
 	kvm_cpu_cap_init(CPUID_7_EDX,
 		/* Reserved, SGX_KEYS */
-		F(AVX512_4VNNIW),
-		F(AVX512_4FMAPS),
-		F(FSRM),
+		F(AVX512_4VNNIW, F_CPUID_DEFAULT),
+		F(AVX512_4FMAPS, F_CPUID_DEFAULT),
+		F(FSRM, F_CPUID_DEFAULT),
 		/* UINT, Reserved, Reserved */
-		F(AVX512_VP2INTERSECT),
+		F(AVX512_VP2INTERSECT, F_CPUID_DEFAULT),
 		/* SRBDS_CTRL */
-		F(MD_CLEAR),
+		F(MD_CLEAR, F_CPUID_DEFAULT),
 		/* RTM_ALWAYS_ABORT, Reserved, TSX_FORCE_ABORT */
-		F(SERIALIZE),
+		F(SERIALIZE, F_CPUID_DEFAULT),
 		/* HYBRID_CPU */
-		F(TSXLDTRK),
+		F(TSXLDTRK, F_CPUID_DEFAULT),
 		/* Reserved, PCONFIG, ARCH_LBR */
-		F(IBT),
+		F(IBT, F_CPUID_DEFAULT),
 		/* Reserved */
-		F(AMX_BF16),
-		F(AVX512_FP16),
-		F(AMX_TILE),
-		F(AMX_INT8),
-		F(SPEC_CTRL),
-		F(INTEL_STIBP),
-		F(FLUSH_L1D),
-		EMULATED_F(ARCH_CAPABILITIES),
+		F(AMX_BF16, F_CPUID_DEFAULT),
+		F(AVX512_FP16, F_CPUID_DEFAULT),
+		F(AMX_TILE, F_CPUID_DEFAULT),
+		F(AMX_INT8, F_CPUID_DEFAULT),
+		F(SPEC_CTRL, F_CPUID_DEFAULT),
+		F(INTEL_STIBP, F_CPUID_DEFAULT),
+		F(FLUSH_L1D, F_CPUID_DEFAULT),
+		EMULATED_F(ARCH_CAPABILITIES, F_CPUID_DEFAULT),
 		/* CORE_CAPABILITIES */
-		F(SPEC_CTRL_SSBD),
+		F(SPEC_CTRL_SSBD, F_CPUID_DEFAULT),
 	);
 
 	/*
@@ -1040,97 +1050,97 @@ void kvm_initialize_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);
 
 	kvm_cpu_cap_init(CPUID_7_1_EAX,
-		F(SHA512),
-		F(SM3),
-		F(SM4),
-		F(AVX_VNNI),
-		F(AVX512_BF16),
-		F(CMPCCXADD),
-		F(FZRM),
-		F(FSRS),
-		F(FSRC),
-		X86_64_F(LKGS),
-		F(WRMSRNS),
-		F(AMX_FP16),
-		F(AVX_IFMA),
-		F(LAM),
-		F(MOVRS),
+		F(SHA512, F_CPUID_DEFAULT),
+		F(SM3, F_CPUID_DEFAULT),
+		F(SM4, F_CPUID_DEFAULT),
+		F(AVX_VNNI, F_CPUID_DEFAULT),
+		F(AVX512_BF16, F_CPUID_DEFAULT),
+		F(CMPCCXADD, F_CPUID_DEFAULT),
+		F(FZRM, F_CPUID_DEFAULT),
+		F(FSRS, F_CPUID_DEFAULT),
+		F(FSRC, F_CPUID_DEFAULT),
+		X86_64_F(LKGS, F_CPUID_DEFAULT),
+		F(WRMSRNS, F_CPUID_DEFAULT),
+		F(AMX_FP16, F_CPUID_DEFAULT),
+		F(AVX_IFMA, F_CPUID_DEFAULT),
+		F(LAM, F_CPUID_DEFAULT),
+		F(MOVRS, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_7_1_ECX,
-		SCATTERED_F(MSR_IMM),
+		SCATTERED_F(MSR_IMM, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_7_1_EDX,
-		F(AVX_VNNI_INT8),
-		F(AVX_NE_CONVERT),
-		F(AMX_COMPLEX),
-		F(AVX_VNNI_INT16),
-		F(PREFETCHITI),
-		F(AVX10),
+		F(AVX_VNNI_INT8, F_CPUID_DEFAULT),
+		F(AVX_NE_CONVERT, F_CPUID_DEFAULT),
+		F(AMX_COMPLEX, F_CPUID_DEFAULT),
+		F(AVX_VNNI_INT16, F_CPUID_DEFAULT),
+		F(PREFETCHITI, F_CPUID_DEFAULT),
+		F(AVX10, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_7_2_EDX,
-		F(INTEL_PSFD),
-		F(IPRED_CTRL),
-		F(RRSBA_CTRL),
-		F(DDPD_U),
-		F(BHI_CTRL),
-		F(MCDT_NO),
+		F(INTEL_PSFD, F_CPUID_DEFAULT),
+		F(IPRED_CTRL, F_CPUID_DEFAULT),
+		F(RRSBA_CTRL, F_CPUID_DEFAULT),
+		F(DDPD_U, F_CPUID_DEFAULT),
+		F(BHI_CTRL, F_CPUID_DEFAULT),
+		F(MCDT_NO, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_D_1_EAX,
-		F(XSAVEOPT),
-		F(XSAVEC),
-		F(XGETBV1),
-		F(XSAVES),
-		X86_64_F(XFD),
+		F(XSAVEOPT, F_CPUID_DEFAULT),
+		F(XSAVEC, F_CPUID_DEFAULT),
+		F(XGETBV1, F_CPUID_DEFAULT),
+		F(XSAVES, F_CPUID_DEFAULT),
+		X86_64_F(XFD, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_12_EAX,
-		SCATTERED_F(SGX1),
-		SCATTERED_F(SGX2),
-		SCATTERED_F(SGX_EDECCSSA),
+		SCATTERED_F(SGX1, F_CPUID_DEFAULT),
+		SCATTERED_F(SGX2, F_CPUID_DEFAULT),
+		SCATTERED_F(SGX_EDECCSSA, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_1E_1_EAX,
-		F(AMX_INT8_ALIAS),
-		F(AMX_BF16_ALIAS),
-		F(AMX_COMPLEX_ALIAS),
-		F(AMX_FP16_ALIAS),
-		F(AMX_FP8),
-		F(AMX_TF32),
-		F(AMX_AVX512),
-		F(AMX_MOVRS),
+		F(AMX_INT8_ALIAS, F_CPUID_DEFAULT),
+		F(AMX_BF16_ALIAS, F_CPUID_DEFAULT),
+		F(AMX_COMPLEX_ALIAS, F_CPUID_DEFAULT),
+		F(AMX_FP16_ALIAS, F_CPUID_DEFAULT),
+		F(AMX_FP8, F_CPUID_DEFAULT),
+		F(AMX_TF32, F_CPUID_DEFAULT),
+		F(AMX_AVX512, F_CPUID_DEFAULT),
+		F(AMX_MOVRS, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_24_0_EBX,
-		F(AVX10_128),
-		F(AVX10_256),
-		F(AVX10_512),
+		F(AVX10_128, F_CPUID_DEFAULT),
+		F(AVX10_256, F_CPUID_DEFAULT),
+		F(AVX10_512, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_24_1_ECX,
-		F(AVX10_VNNI_INT),
+		F(AVX10_VNNI_INT, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_8000_0001_ECX,
-		F(LAHF_LM),
-		F(CMP_LEGACY),
+		F(LAHF_LM, F_CPUID_DEFAULT),
+		F(CMP_LEGACY, F_CPUID_DEFAULT),
 		VENDOR_F(SVM),
 		/* ExtApicSpace */
-		F(CR8_LEGACY),
-		F(ABM),
-		F(SSE4A),
-		F(MISALIGNSSE),
-		F(3DNOWPREFETCH),
-		F(OSVW),
+		F(CR8_LEGACY, F_CPUID_DEFAULT),
+		F(ABM, F_CPUID_DEFAULT),
+		F(SSE4A, F_CPUID_DEFAULT),
+		F(MISALIGNSSE, F_CPUID_DEFAULT),
+		F(3DNOWPREFETCH, F_CPUID_DEFAULT),
+		F(OSVW, F_CPUID_DEFAULT),
 		/* IBS */
-		F(XOP),
+		F(XOP, F_CPUID_DEFAULT),
 		/* SKINIT, WDT, LWP */
-		F(FMA4),
-		F(TBM),
-		F(TOPOEXT),
+		F(FMA4, F_CPUID_DEFAULT),
+		F(TBM, F_CPUID_DEFAULT),
+		F(TOPOEXT, F_CPUID_DEFAULT),
 		VENDOR_F(PERFCTR_CORE),
 	);
 
@@ -1146,7 +1156,7 @@ void kvm_initialize_cpu_caps(void)
 		ALIASED_1_EDX_F(CX8),
 		ALIASED_1_EDX_F(APIC),
 		/* Reserved */
-		F(SYSCALL),
+		F(SYSCALL, F_CPUID_DEFAULT),
 		ALIASED_1_EDX_F(MTRR),
 		ALIASED_1_EDX_F(PGE),
 		ALIASED_1_EDX_F(MCA),
@@ -1154,42 +1164,42 @@ void kvm_initialize_cpu_caps(void)
 		ALIASED_1_EDX_F(PAT),
 		ALIASED_1_EDX_F(PSE36),
 		/* Reserved */
-		F(NX),
+		F(NX, F_CPUID_DEFAULT),
 		/* Reserved */
-		F(MMXEXT),
+		F(MMXEXT, F_CPUID_DEFAULT),
 		ALIASED_1_EDX_F(MMX),
 		ALIASED_1_EDX_F(FXSR),
-		F(FXSR_OPT),
-		X86_64_F(GBPAGES),
-		F(RDTSCP),
+		F(FXSR_OPT, F_CPUID_DEFAULT),
+		X86_64_F(GBPAGES, F_CPUID_DEFAULT),
+		F(RDTSCP, F_CPUID_DEFAULT),
 		/* Reserved */
-		X86_64_F(LM),
-		F(3DNOWEXT),
-		F(3DNOW),
+		X86_64_F(LM, F_CPUID_DEFAULT),
+		F(3DNOWEXT, F_CPUID_DEFAULT),
+		F(3DNOW, F_CPUID_DEFAULT),
 	);
 
 	if (!tdp_enabled && IS_ENABLED(CONFIG_X86_64))
 		kvm_cpu_cap_set(X86_FEATURE_GBPAGES);
 
 	kvm_cpu_cap_init(CPUID_8000_0007_EDX,
-		SCATTERED_F(CONSTANT_TSC),
+		SCATTERED_F(CONSTANT_TSC, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_8000_0008_EBX,
-		F(CLZERO),
-		F(XSAVEERPTR),
-		F(WBNOINVD),
-		F(AMD_IBPB),
-		F(AMD_IBRS),
-		F(AMD_SSBD),
-		F(VIRT_SSBD),
-		F(AMD_SSB_NO),
-		F(AMD_STIBP),
-		F(AMD_STIBP_ALWAYS_ON),
-		F(AMD_IBRS_SAME_MODE),
-		PASSTHROUGH_F(EFER_LMSLE_MBZ),
-		F(AMD_PSFD),
-		F(AMD_IBPB_RET),
+		F(CLZERO, F_CPUID_DEFAULT),
+		F(XSAVEERPTR, F_CPUID_DEFAULT),
+		F(WBNOINVD, F_CPUID_DEFAULT),
+		F(AMD_IBPB, F_CPUID_DEFAULT),
+		F(AMD_IBRS, F_CPUID_DEFAULT),
+		F(AMD_SSBD, F_CPUID_DEFAULT),
+		F(VIRT_SSBD, F_CPUID_DEFAULT),
+		F(AMD_SSB_NO, F_CPUID_DEFAULT),
+		F(AMD_STIBP, F_CPUID_DEFAULT),
+		F(AMD_STIBP_ALWAYS_ON, F_CPUID_DEFAULT),
+		F(AMD_IBRS_SAME_MODE, F_CPUID_DEFAULT),
+		PASSTHROUGH_F(EFER_LMSLE_MBZ, F_CPUID_DEFAULT),
+		F(AMD_PSFD, F_CPUID_DEFAULT),
+		F(AMD_IBPB_RET, F_CPUID_DEFAULT),
 	);
 
 	/*
@@ -1240,12 +1250,12 @@ void kvm_initialize_cpu_caps(void)
 		VENDOR_F(SEV),
 		/* VM_PAGE_FLUSH */
 		VENDOR_F(SEV_ES),
-		F(SME_COHERENT),
+		F(SME_COHERENT, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_8000_0021_EAX,
-		F(NO_NESTED_DATA_BP),
-		F(WRMSR_XX_BASE_NS),
+		F(NO_NESTED_DATA_BP, F_CPUID_DEFAULT),
+		F(WRMSR_XX_BASE_NS, F_CPUID_DEFAULT),
 		/*
 		 * Synthesize "LFENCE is serializing" into the AMD-defined entry
 		 * in KVM's supported CPUID, i.e. if the feature is reported as
@@ -1256,49 +1266,49 @@ void kvm_initialize_cpu_caps(void)
 		 * CPUID will drop the flags, and reporting support in AMD's
 		 * leaf can make it easier for userspace to detect the feature.
 		 */
-		SYNTHESIZED_F(LFENCE_RDTSC),
+		SYNTHESIZED_F(LFENCE_RDTSC, F_CPUID_DEFAULT),
 		/* SmmPgCfgLock */
 		/* 4: Resv */
-		SYNTHESIZED_F(VERW_CLEAR),
-		F(NULL_SEL_CLR_BASE),
+		SYNTHESIZED_F(VERW_CLEAR, F_CPUID_DEFAULT),
+		F(NULL_SEL_CLR_BASE, F_CPUID_DEFAULT),
 		/* UpperAddressIgnore */
-		F(AUTOIBRS),
-		EMULATED_F(NO_SMM_CTL_MSR),
+		F(AUTOIBRS, F_CPUID_DEFAULT),
+		EMULATED_F(NO_SMM_CTL_MSR, F_CPUID_DEFAULT),
 		/* PrefetchCtlMsr */
 		/* GpOnUserCpuid */
 		/* EPSF */
-		F(PREFETCHI),
-		F(AVX512_BMM),
-		F(ERAPS),
-		SYNTHESIZED_F(SBPB),
-		SYNTHESIZED_F(IBPB_BRTYPE),
-		SYNTHESIZED_F(SRSO_NO),
-		F(SRSO_USER_KERNEL_NO),
+		F(PREFETCHI, F_CPUID_DEFAULT),
+		F(AVX512_BMM, F_CPUID_DEFAULT),
+		F(ERAPS, F_CPUID_DEFAULT),
+		SYNTHESIZED_F(SBPB, F_CPUID_DEFAULT),
+		SYNTHESIZED_F(IBPB_BRTYPE, F_CPUID_DEFAULT),
+		SYNTHESIZED_F(SRSO_NO, F_CPUID_DEFAULT),
+		F(SRSO_USER_KERNEL_NO, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_8000_0021_ECX,
-		SYNTHESIZED_F(TSA_SQ_NO),
-		SYNTHESIZED_F(TSA_L1_NO),
+		SYNTHESIZED_F(TSA_SQ_NO, F_CPUID_DEFAULT),
+		SYNTHESIZED_F(TSA_L1_NO, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_8000_0022_EAX,
-		F(PERFMON_V2),
+		F(PERFMON_V2, F_CPUID_DEFAULT),
 	);
 
 	if (!static_cpu_has_bug(X86_BUG_NULL_SEG))
 		kvm_cpu_cap_set(X86_FEATURE_NULL_SEL_CLR_BASE);
 
 	kvm_cpu_cap_init(CPUID_C000_0001_EDX,
-		F(XSTORE),
-		F(XSTORE_EN),
-		F(XCRYPT),
-		F(XCRYPT_EN),
-		F(ACE2),
-		F(ACE2_EN),
-		F(PHE),
-		F(PHE_EN),
-		F(PMM),
-		F(PMM_EN),
+		F(XSTORE, F_CPUID_DEFAULT),
+		F(XSTORE_EN, F_CPUID_DEFAULT),
+		F(XCRYPT, F_CPUID_DEFAULT),
+		F(XCRYPT_EN, F_CPUID_DEFAULT),
+		F(ACE2, F_CPUID_DEFAULT),
+		F(ACE2_EN, F_CPUID_DEFAULT),
+		F(PHE, F_CPUID_DEFAULT),
+		F(PHE_EN, F_CPUID_DEFAULT),
+		F(PMM, F_CPUID_DEFAULT),
+		F(PMM_EN, F_CPUID_DEFAULT),
 	);
 
 	/*
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index f41f8d3db794..e87adecacd03 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -14,9 +14,13 @@ enum kvm_cpuid_overlay {
 	NR_CPUID_OL
 };
 
+/* Temporarily use VMX overlay as the default one */
+#define CPUID_OL_DEFAULT	CPUID_OL_VMX
+
 #define F_CPUID_VMX		BIT(CPUID_OL_VMX)
 #define F_CPUID_SVM		BIT(CPUID_OL_SVM)
 #define F_CPUID_TDX		BIT(CPUID_OL_TDX)
+#define F_CPUID_DEFAULT		(F_CPUID_VMX | F_CPUID_SVM)
 
 static inline u8 get_cpuid_overlay(struct kvm *kvm)
 {
@@ -30,7 +34,7 @@ static inline u8 get_cpuid_overlay(struct kvm *kvm)
 	return CPUID_OL_VMX;
 }
 
-extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
+extern u32 kvm_cpu_caps[NR_CPUID_OL][NR_KVM_CPU_CAPS] __read_mostly;
 extern bool kvm_is_configuring_cpu_caps __read_mostly;
 
 void kvm_initialize_cpu_caps(void);
@@ -118,8 +122,8 @@ static __always_inline void cpuid_entry_override(struct kvm_cpuid_entry2 *entry,
 {
 	u32 *reg = cpuid_entry_get_reg(entry, leaf * 32);
 
-	BUILD_BUG_ON(leaf >= ARRAY_SIZE(kvm_cpu_caps));
-	*reg = kvm_cpu_caps[leaf];
+	BUILD_BUG_ON(leaf >= ARRAY_SIZE(kvm_cpu_caps[0]));
+	*reg = kvm_cpu_caps[CPUID_OL_DEFAULT][leaf];
 }
 
 static __always_inline bool guest_cpuid_has(struct kvm_vcpu *vcpu,
@@ -220,7 +224,7 @@ static __always_inline void kvm_cpu_cap_clear(unsigned int x86_feature)
 	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
 	WARN_ON_ONCE(!kvm_is_configuring_cpu_caps);
-	kvm_cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature);
+	kvm_cpu_caps[CPUID_OL_DEFAULT][x86_leaf] &= ~__feature_bit(x86_feature);
 }
 
 static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature)
@@ -228,14 +232,14 @@ static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature)
 	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
 	WARN_ON_ONCE(!kvm_is_configuring_cpu_caps);
-	kvm_cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
+	kvm_cpu_caps[CPUID_OL_DEFAULT][x86_leaf] |= __feature_bit(x86_feature);
 }
 
 static __always_inline u32 kvm_cpu_cap_get(unsigned int x86_feature)
 {
 	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
-	return kvm_cpu_caps[x86_leaf] & __feature_bit(x86_feature);
+	return kvm_cpu_caps[CPUID_OL_DEFAULT][x86_leaf] & __feature_bit(x86_feature);
 }
 
 static __always_inline bool kvm_cpu_cap_has(unsigned int x86_feature)
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 05/27] KVM: x86: Extend kvm_cpu_cap_{set/clear}() to configure overlays
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (3 preceding siblings ...)
  2026-04-17  7:35 ` [RFC PATCH 04/27] KVM: x86: Extend F() and its variants " Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 06/27] KVM: x86: Populate TDX CPUID overlay with supported feature bits Binbin Wu
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Extend kvm_cpu_cap_{set/clear}() for preparation to set/clear a CPU
feature bit for different overlays.

All callers use F_CPUID_DEFAULT to set/clear a capability for both
VMX and SVM overlays.

The effective overlay used is still CPUID_OL_DEFAULT (VMX overlay) for
all VM types.

No functional change intended.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c   | 36 ++++++++++++++++++------------------
 arch/x86/kvm/cpuid.h   | 18 ++++++++++++------
 arch/x86/kvm/svm/sev.c |  6 +++---
 arch/x86/kvm/svm/svm.c | 38 +++++++++++++++++++-------------------
 arch/x86/kvm/vmx/vmx.c | 38 +++++++++++++++++++-------------------
 arch/x86/kvm/x86.c     |  4 ++--
 6 files changed, 73 insertions(+), 67 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index d3f3e9f0d493..767c007ab5f0 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -992,7 +992,7 @@ void kvm_initialize_cpu_caps(void)
 	 * to be set on the host. Clear it if that is not the case
 	 */
 	if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
-		kvm_cpu_cap_clear(X86_FEATURE_PKU);
+		kvm_cpu_cap_clear(X86_FEATURE_PKU, F_CPUID_DEFAULT);
 
 	/*
 	 * Shadow Stacks aren't implemented in the Shadow MMU.  Shadow Stack
@@ -1000,7 +1000,7 @@ void kvm_initialize_cpu_caps(void)
 	 * doesn't know how to emulate or map.
 	 */
 	if (!tdp_enabled)
-		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK, F_CPUID_DEFAULT);
 
 	kvm_cpu_cap_init(CPUID_7_EDX,
 		/* Reserved, SGX_KEYS */
@@ -1036,18 +1036,18 @@ void kvm_initialize_cpu_caps(void)
 	 * SHSTK, nor does KVM handle Shadow Stack #PFs (see above).
 	 */
 	if (allow_smaller_maxphyaddr) {
-		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
-		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT, F_CPUID_DEFAULT);
 	}
 
 	if (boot_cpu_has(X86_FEATURE_AMD_IBPB_RET) &&
 	    boot_cpu_has(X86_FEATURE_AMD_IBPB) &&
 	    boot_cpu_has(X86_FEATURE_AMD_IBRS))
-		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL);
+		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL, F_CPUID_DEFAULT);
 	if (boot_cpu_has(X86_FEATURE_STIBP))
-		kvm_cpu_cap_set(X86_FEATURE_INTEL_STIBP);
+		kvm_cpu_cap_set(X86_FEATURE_INTEL_STIBP, F_CPUID_DEFAULT);
 	if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
-		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);
+		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD, F_CPUID_DEFAULT);
 
 	kvm_cpu_cap_init(CPUID_7_1_EAX,
 		F(SHA512, F_CPUID_DEFAULT),
@@ -1179,7 +1179,7 @@ void kvm_initialize_cpu_caps(void)
 	);
 
 	if (!tdp_enabled && IS_ENABLED(CONFIG_X86_64))
-		kvm_cpu_cap_set(X86_FEATURE_GBPAGES);
+		kvm_cpu_cap_set(X86_FEATURE_GBPAGES, F_CPUID_DEFAULT);
 
 	kvm_cpu_cap_init(CPUID_8000_0007_EDX,
 		SCATTERED_F(CONSTANT_TSC, F_CPUID_DEFAULT),
@@ -1208,26 +1208,26 @@ void kvm_initialize_cpu_caps(void)
 	 * record that in cpufeatures so use them.
 	 */
 	if (boot_cpu_has(X86_FEATURE_IBPB)) {
-		kvm_cpu_cap_set(X86_FEATURE_AMD_IBPB);
+		kvm_cpu_cap_set(X86_FEATURE_AMD_IBPB, F_CPUID_DEFAULT);
 		if (boot_cpu_has(X86_FEATURE_SPEC_CTRL) &&
 		    !boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB))
-			kvm_cpu_cap_set(X86_FEATURE_AMD_IBPB_RET);
+			kvm_cpu_cap_set(X86_FEATURE_AMD_IBPB_RET, F_CPUID_DEFAULT);
 	}
 	if (boot_cpu_has(X86_FEATURE_IBRS))
-		kvm_cpu_cap_set(X86_FEATURE_AMD_IBRS);
+		kvm_cpu_cap_set(X86_FEATURE_AMD_IBRS, F_CPUID_DEFAULT);
 	if (boot_cpu_has(X86_FEATURE_STIBP))
-		kvm_cpu_cap_set(X86_FEATURE_AMD_STIBP);
+		kvm_cpu_cap_set(X86_FEATURE_AMD_STIBP, F_CPUID_DEFAULT);
 	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD))
-		kvm_cpu_cap_set(X86_FEATURE_AMD_SSBD);
+		kvm_cpu_cap_set(X86_FEATURE_AMD_SSBD, F_CPUID_DEFAULT);
 	if (!boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS))
-		kvm_cpu_cap_set(X86_FEATURE_AMD_SSB_NO);
+		kvm_cpu_cap_set(X86_FEATURE_AMD_SSB_NO, F_CPUID_DEFAULT);
 	/*
 	 * The preference is to use SPEC CTRL MSR instead of the
 	 * VIRT_SPEC MSR.
 	 */
 	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) &&
 	    !boot_cpu_has(X86_FEATURE_AMD_SSBD))
-		kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
+		kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD, F_CPUID_DEFAULT);
 
 	/* All SVM features required additional vendor module enabling. */
 	kvm_cpu_cap_init(CPUID_8000_000A_EDX,
@@ -1296,7 +1296,7 @@ void kvm_initialize_cpu_caps(void)
 	);
 
 	if (!static_cpu_has_bug(X86_BUG_NULL_SEG))
-		kvm_cpu_cap_set(X86_FEATURE_NULL_SEL_CLR_BASE);
+		kvm_cpu_cap_set(X86_FEATURE_NULL_SEL_CLR_BASE, F_CPUID_DEFAULT);
 
 	kvm_cpu_cap_init(CPUID_C000_0001_EDX,
 		F(XSTORE, F_CPUID_DEFAULT),
@@ -1322,8 +1322,8 @@ void kvm_initialize_cpu_caps(void)
 	if (WARN_ON((kvm_cpu_cap_has(X86_FEATURE_RDTSCP) ||
 		     kvm_cpu_cap_has(X86_FEATURE_RDPID)) &&
 		     !kvm_is_supported_user_return_msr(MSR_TSC_AUX))) {
-		kvm_cpu_cap_clear(X86_FEATURE_RDTSCP);
-		kvm_cpu_cap_clear(X86_FEATURE_RDPID);
+		kvm_cpu_cap_clear(X86_FEATURE_RDTSCP, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_RDPID, F_CPUID_DEFAULT);
 	}
 }
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_initialize_cpu_caps);
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index e87adecacd03..4b1274f055e5 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -219,20 +219,26 @@ static inline bool cpuid_fault_enabled(struct kvm_vcpu *vcpu)
 		  MSR_MISC_FEATURES_ENABLES_CPUID_FAULT;
 }
 
-static __always_inline void kvm_cpu_cap_clear(unsigned int x86_feature)
+static __always_inline void kvm_cpu_cap_clear(unsigned int x86_feature, u32 overlay_mask)
 {
 	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
 	WARN_ON_ONCE(!kvm_is_configuring_cpu_caps);
-	kvm_cpu_caps[CPUID_OL_DEFAULT][x86_leaf] &= ~__feature_bit(x86_feature);
+	for (int i = 0; i < NR_CPUID_OL; i++) {
+		if (overlay_mask & BIT(i))
+			kvm_cpu_caps[i][x86_leaf] &= ~__feature_bit(x86_feature);
+	}
 }
 
-static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature)
+static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature, u32 overlay_mask)
 {
 	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
 	WARN_ON_ONCE(!kvm_is_configuring_cpu_caps);
-	kvm_cpu_caps[CPUID_OL_DEFAULT][x86_leaf] |= __feature_bit(x86_feature);
+	for (int i = 0; i < NR_CPUID_OL; i++) {
+		if (overlay_mask & BIT(i))
+			kvm_cpu_caps[i][x86_leaf] |= __feature_bit(x86_feature);
+	}
 }
 
 static __always_inline u32 kvm_cpu_cap_get(unsigned int x86_feature)
@@ -247,10 +253,10 @@ static __always_inline bool kvm_cpu_cap_has(unsigned int x86_feature)
 	return !!kvm_cpu_cap_get(x86_feature);
 }
 
-static __always_inline void kvm_cpu_cap_check_and_set(unsigned int x86_feature)
+static __always_inline void kvm_cpu_cap_check_and_set(unsigned int x86_feature, u32 overlay_mask)
 {
 	if (boot_cpu_has(x86_feature))
-		kvm_cpu_cap_set(x86_feature);
+		kvm_cpu_cap_set(x86_feature, overlay_mask);
 }
 
 static __always_inline bool guest_pv_has(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c2126b3c3072..6ec9c806e1fb 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3014,15 +3014,15 @@ void sev_vm_destroy(struct kvm *kvm)
 void __init sev_set_cpu_caps(void)
 {
 	if (sev_enabled) {
-		kvm_cpu_cap_set(X86_FEATURE_SEV);
+		kvm_cpu_cap_set(X86_FEATURE_SEV, F_CPUID_DEFAULT);
 		kvm_caps.supported_vm_types |= BIT(KVM_X86_SEV_VM);
 	}
 	if (sev_es_enabled) {
-		kvm_cpu_cap_set(X86_FEATURE_SEV_ES);
+		kvm_cpu_cap_set(X86_FEATURE_SEV_ES, F_CPUID_DEFAULT);
 		kvm_caps.supported_vm_types |= BIT(KVM_X86_SEV_ES_VM);
 	}
 	if (sev_snp_enabled) {
-		kvm_cpu_cap_set(X86_FEATURE_SEV_SNP);
+		kvm_cpu_cap_set(X86_FEATURE_SEV_SNP, F_CPUID_DEFAULT);
 		kvm_caps.supported_vm_types |= BIT(KVM_X86_SNP_VM);
 	}
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e7fdd7a9c280..7d1289f34f9f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5445,48 +5445,48 @@ static __init void svm_set_cpu_caps(void)
 
 	kvm_caps.supported_perf_cap = 0;
 
-	kvm_cpu_cap_clear(X86_FEATURE_IBT);
+	kvm_cpu_cap_clear(X86_FEATURE_IBT, F_CPUID_DEFAULT);
 
 	/* CPUID 0x80000001 and 0x8000000A (SVM features) */
 	if (nested) {
-		kvm_cpu_cap_set(X86_FEATURE_SVM);
-		kvm_cpu_cap_set(X86_FEATURE_VMCBCLEAN);
+		kvm_cpu_cap_set(X86_FEATURE_SVM, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_VMCBCLEAN, F_CPUID_DEFAULT);
 
 		/*
 		 * KVM currently flushes TLBs on *every* nested SVM transition,
 		 * and so for all intents and purposes KVM supports flushing by
 		 * ASID, i.e. KVM is guaranteed to honor every L1 ASID flush.
 		 */
-		kvm_cpu_cap_set(X86_FEATURE_FLUSHBYASID);
+		kvm_cpu_cap_set(X86_FEATURE_FLUSHBYASID, F_CPUID_DEFAULT);
 
 		if (nrips)
-			kvm_cpu_cap_set(X86_FEATURE_NRIPS);
+			kvm_cpu_cap_set(X86_FEATURE_NRIPS, F_CPUID_DEFAULT);
 
 		if (npt_enabled)
-			kvm_cpu_cap_set(X86_FEATURE_NPT);
+			kvm_cpu_cap_set(X86_FEATURE_NPT, F_CPUID_DEFAULT);
 
 		if (tsc_scaling)
-			kvm_cpu_cap_set(X86_FEATURE_TSCRATEMSR);
+			kvm_cpu_cap_set(X86_FEATURE_TSCRATEMSR, F_CPUID_DEFAULT);
 
 		if (vls)
-			kvm_cpu_cap_set(X86_FEATURE_V_VMSAVE_VMLOAD);
+			kvm_cpu_cap_set(X86_FEATURE_V_VMSAVE_VMLOAD, F_CPUID_DEFAULT);
 		if (lbrv)
-			kvm_cpu_cap_set(X86_FEATURE_LBRV);
+			kvm_cpu_cap_set(X86_FEATURE_LBRV, F_CPUID_DEFAULT);
 
 		if (boot_cpu_has(X86_FEATURE_PAUSEFILTER))
-			kvm_cpu_cap_set(X86_FEATURE_PAUSEFILTER);
+			kvm_cpu_cap_set(X86_FEATURE_PAUSEFILTER, F_CPUID_DEFAULT);
 
 		if (boot_cpu_has(X86_FEATURE_PFTHRESHOLD))
-			kvm_cpu_cap_set(X86_FEATURE_PFTHRESHOLD);
+			kvm_cpu_cap_set(X86_FEATURE_PFTHRESHOLD, F_CPUID_DEFAULT);
 
 		if (vgif)
-			kvm_cpu_cap_set(X86_FEATURE_VGIF);
+			kvm_cpu_cap_set(X86_FEATURE_VGIF, F_CPUID_DEFAULT);
 
 		if (vnmi)
-			kvm_cpu_cap_set(X86_FEATURE_VNMI);
+			kvm_cpu_cap_set(X86_FEATURE_VNMI, F_CPUID_DEFAULT);
 
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
-		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
+		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK, F_CPUID_DEFAULT);
 	}
 
 	if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD))
@@ -5495,7 +5495,7 @@ static __init void svm_set_cpu_caps(void)
 	/* CPUID 0x80000008 */
 	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
 	    boot_cpu_has(X86_FEATURE_AMD_SSBD))
-		kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
+		kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD, F_CPUID_DEFAULT);
 
 	if (enable_pmu) {
 		/*
@@ -5507,11 +5507,11 @@ static __init void svm_set_cpu_caps(void)
 			kvm_pmu_cap.num_counters_gp = min(AMD64_NUM_COUNTERS,
 							  kvm_pmu_cap.num_counters_gp);
 		else
-			kvm_cpu_cap_check_and_set(X86_FEATURE_PERFCTR_CORE);
+			kvm_cpu_cap_check_and_set(X86_FEATURE_PERFCTR_CORE, F_CPUID_DEFAULT);
 
 		if (kvm_pmu_cap.version != 2 ||
 		    !kvm_cpu_cap_has(X86_FEATURE_PERFCTR_CORE))
-			kvm_cpu_cap_clear(X86_FEATURE_PERFMON_V2);
+			kvm_cpu_cap_clear(X86_FEATURE_PERFMON_V2, F_CPUID_DEFAULT);
 	}
 
 	/* CPUID 0x8000001F (SME/SEV features) */
@@ -5521,8 +5521,8 @@ static __init void svm_set_cpu_caps(void)
 	 * Clear capabilities that are automatically configured by common code,
 	 * but that require explicit SVM support (that isn't yet implemented).
 	 */
-	kvm_cpu_cap_clear(X86_FEATURE_BUS_LOCK_DETECT);
-	kvm_cpu_cap_clear(X86_FEATURE_MSR_IMM);
+	kvm_cpu_cap_clear(X86_FEATURE_BUS_LOCK_DETECT, F_CPUID_DEFAULT);
+	kvm_cpu_cap_clear(X86_FEATURE_MSR_IMM, F_CPUID_DEFAULT);
 
 	kvm_setup_xss_caps();
 	kvm_finalize_cpu_caps();
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a29896a9ef14..7879a8a532c4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8083,47 +8083,47 @@ static __init void vmx_set_cpu_caps(void)
 
 	/* CPUID 0x1 */
 	if (nested)
-		kvm_cpu_cap_set(X86_FEATURE_VMX);
+		kvm_cpu_cap_set(X86_FEATURE_VMX, F_CPUID_DEFAULT);
 
 	/* CPUID 0x7 */
 	if (kvm_mpx_supported())
-		kvm_cpu_cap_check_and_set(X86_FEATURE_MPX);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_MPX, F_CPUID_DEFAULT);
 	if (!cpu_has_vmx_invpcid())
-		kvm_cpu_cap_clear(X86_FEATURE_INVPCID);
+		kvm_cpu_cap_clear(X86_FEATURE_INVPCID, F_CPUID_DEFAULT);
 	if (vmx_pt_mode_is_host_guest())
-		kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT, F_CPUID_DEFAULT);
 	if (vmx_pebs_supported()) {
-		kvm_cpu_cap_check_and_set(X86_FEATURE_DS);
-		kvm_cpu_cap_check_and_set(X86_FEATURE_DTES64);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_DS, F_CPUID_DEFAULT);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_DTES64, F_CPUID_DEFAULT);
 	}
 
 	if (!enable_pmu)
-		kvm_cpu_cap_clear(X86_FEATURE_PDCM);
+		kvm_cpu_cap_clear(X86_FEATURE_PDCM, F_CPUID_DEFAULT);
 	kvm_caps.supported_perf_cap = vmx_get_perf_capabilities();
 
 	if (!enable_sgx) {
-		kvm_cpu_cap_clear(X86_FEATURE_SGX);
-		kvm_cpu_cap_clear(X86_FEATURE_SGX_LC);
-		kvm_cpu_cap_clear(X86_FEATURE_SGX1);
-		kvm_cpu_cap_clear(X86_FEATURE_SGX2);
-		kvm_cpu_cap_clear(X86_FEATURE_SGX_EDECCSSA);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX_LC, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX1, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX2, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX_EDECCSSA, F_CPUID_DEFAULT);
 	}
 
 	if (vmx_umip_emulated())
-		kvm_cpu_cap_set(X86_FEATURE_UMIP);
+		kvm_cpu_cap_set(X86_FEATURE_UMIP, F_CPUID_DEFAULT);
 
 	/* CPUID 0xD.1 */
 	if (!cpu_has_vmx_xsaves())
-		kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
+		kvm_cpu_cap_clear(X86_FEATURE_XSAVES, F_CPUID_DEFAULT);
 
 	/* CPUID 0x80000001 and 0x7 (RDPID) */
 	if (!cpu_has_vmx_rdtscp()) {
-		kvm_cpu_cap_clear(X86_FEATURE_RDTSCP);
-		kvm_cpu_cap_clear(X86_FEATURE_RDPID);
+		kvm_cpu_cap_clear(X86_FEATURE_RDTSCP, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_RDPID, F_CPUID_DEFAULT);
 	}
 
 	if (cpu_has_vmx_waitpkg())
-		kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG, F_CPUID_DEFAULT);
 
 	/*
 	 * Disable CET if unrestricted_guest is unsupported as KVM doesn't
@@ -8133,8 +8133,8 @@ static __init void vmx_set_cpu_caps(void)
 	 */
 	if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest ||
 	    !cpu_has_vmx_basic_no_hw_errcode_cc()) {
-		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
-		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT, F_CPUID_DEFAULT);
 	}
 
 	kvm_setup_xss_caps();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0a1b63c63d1a..5b830997e693 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10024,8 +10024,8 @@ void kvm_setup_xss_caps(void)
 		kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_ALL;
 
 	if ((kvm_caps.supported_xss & XFEATURE_MASK_CET_ALL) != XFEATURE_MASK_CET_ALL) {
-		kvm_cpu_cap_clear(X86_FEATURE_SHSTK);
-		kvm_cpu_cap_clear(X86_FEATURE_IBT);
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT, F_CPUID_DEFAULT);
 		kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_ALL;
 	}
 }
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 06/27] KVM: x86: Populate TDX CPUID overlay with supported feature bits
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (4 preceding siblings ...)
  2026-04-17  7:35 ` [RFC PATCH 05/27] KVM: x86: Extend kvm_cpu_cap_{set/clear}() to configure overlays Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 07/27] KVM: x86: Support KVM_GET_{SUPPORTED,EMULATED}_CPUID as VM scope ioctls Binbin Wu
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Tag CPUID feature bits with F_CPUID_TDX in kvm_initialize_cpu_caps() and
vmx_set_cpu_caps() to populate the TDX overlay, so that KVM can
advertise/check a TDX-specific set of CPUID capabilities to/from
userspace for TDX guests.

Features that are deliberately *not* tagged with F_CPUID_TDX fall into
the following categories:
- Fixed-0 or reserved in TDX.
- Not yet supported by KVM for TDX, e.g, HLE, RTM, WAITPKG, etc.
- AMD-only features.

Note that fixed-1 bits, which are initialized via kvm_cpu_cap_init() or
kvm_cpu_cap_check_and_set(), could be impacted by boot_cpu_has() if the
related feature is disabled by host kernel. Considering these features
are normally not disabled, for simplicity, reuse them for TDX overlay.

For CET, TDX follows the support for normal VMX VM, e.g if KVM is
loaded with unrestricted guest disabled or allow_smaller_maxphyaddr
enabled, which should be rare, KVM will clear CET support for TDX
guests as well for simplicity.  Note that allow_smaller_maxphyaddr
doesn't applied to TDX, so SHSTK and IBT are not cleared for TDX overlay
in kvm_initialize_cpu_caps() when allow_smaller_maxphyaddr is true,
however, without SHSTK and IBT, XFEATURE_MASK_CET_ALL will be cleared
in kvm_caps.supported_xss, so that SHSTK and IBT are cleared for TDX
overlay eventually in kvm_setup_xss_caps().

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c   | 320 +++++++++++++++++++++--------------------
 arch/x86/kvm/vmx/vmx.c |  22 ++-
 arch/x86/kvm/x86.c     |   4 +-
 3 files changed, 184 insertions(+), 162 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 767c007ab5f0..938b19767feb 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -854,8 +854,8 @@ void kvm_initialize_cpu_caps(void)
 		     sizeof(boot_cpu_data.x86_capability));
 
 	kvm_cpu_cap_init(CPUID_1_ECX,
-		F(XMM3, F_CPUID_DEFAULT),
-		F(PCLMULQDQ, F_CPUID_DEFAULT),
+		F(XMM3, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(PCLMULQDQ, F_CPUID_DEFAULT | F_CPUID_TDX),
 		VENDOR_F(DTES64),
 		/*
 		 * NOTE: MONITOR (and MWAIT) are emulated as NOP, but *not*
@@ -864,124 +864,131 @@ void kvm_initialize_cpu_caps(void)
 		 * that KVM is aware that it's a known, unadvertised flag.
 		 */
 		RUNTIME_F(MWAIT),
-		/* DS-CPL */
+		/* DSCPL is fixed-1 in TDX */
+		F(DSCPL, F_CPUID_TDX),
 		VENDOR_F(VMX),
 		/* SMX, EST */
 		/* TM2 */
-		F(SSSE3, F_CPUID_DEFAULT),
+		F(SSSE3, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* CNXT-ID */
 		/* Reserved */
-		F(FMA, F_CPUID_DEFAULT),
-		F(CX16, F_CPUID_DEFAULT),
+		F(FMA, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(CX16, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* xTPR Update */
-		F(PDCM, F_CPUID_DEFAULT),
-		F(PCID, F_CPUID_DEFAULT),
+		F(PDCM, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(PCID, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* Reserved, DCA */
-		F(XMM4_1, F_CPUID_DEFAULT),
-		F(XMM4_2, F_CPUID_DEFAULT),
-		EMULATED_F(X2APIC, F_CPUID_DEFAULT),
-		F(MOVBE, F_CPUID_DEFAULT),
-		F(POPCNT, F_CPUID_DEFAULT),
-		EMULATED_F(TSC_DEADLINE_TIMER, F_CPUID_DEFAULT),
-		F(AES, F_CPUID_DEFAULT),
-		F(XSAVE, F_CPUID_DEFAULT),
+		F(XMM4_1, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(XMM4_2, F_CPUID_DEFAULT | F_CPUID_TDX),
+		EMULATED_F(X2APIC, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(MOVBE, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(POPCNT, F_CPUID_DEFAULT | F_CPUID_TDX),
+		EMULATED_F(TSC_DEADLINE_TIMER, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AES, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(XSAVE, F_CPUID_DEFAULT | F_CPUID_TDX),
 		RUNTIME_F(OSXSAVE),
-		F(AVX, F_CPUID_DEFAULT),
-		F(F16C, F_CPUID_DEFAULT),
-		F(RDRAND, F_CPUID_DEFAULT),
-		EMULATED_F(HYPERVISOR, F_CPUID_DEFAULT),
+		F(AVX, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(F16C, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(RDRAND, F_CPUID_DEFAULT | F_CPUID_TDX),
+		EMULATED_F(HYPERVISOR, F_CPUID_DEFAULT | F_CPUID_TDX),
 	);
 
 	kvm_cpu_cap_init(CPUID_1_EDX,
-		F(FPU, F_CPUID_DEFAULT),
-		F(VME, F_CPUID_DEFAULT),
-		F(DE, F_CPUID_DEFAULT),
-		F(PSE, F_CPUID_DEFAULT),
-		F(TSC, F_CPUID_DEFAULT),
-		F(MSR, F_CPUID_DEFAULT),
-		F(PAE, F_CPUID_DEFAULT),
-		F(MCE, F_CPUID_DEFAULT),
-		F(CX8, F_CPUID_DEFAULT),
-		F(APIC, F_CPUID_DEFAULT),
+		F(FPU, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(VME, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(DE, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(PSE, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(TSC, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(MSR, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(PAE, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(MCE, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(CX8, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(APIC, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* Reserved */
-		F(SEP, F_CPUID_DEFAULT),
-		F(MTRR, F_CPUID_DEFAULT),
-		F(PGE, F_CPUID_DEFAULT),
-		F(MCA, F_CPUID_DEFAULT),
-		F(CMOV, F_CPUID_DEFAULT),
-		F(PAT, F_CPUID_DEFAULT),
+		F(SEP, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(MTRR, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(PGE, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(MCA, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(CMOV, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(PAT, F_CPUID_DEFAULT | F_CPUID_TDX),
+		/* PSE36 is fixed-0 in TDX */
 		F(PSE36, F_CPUID_DEFAULT),
 		/* PSN */
-		F(CLFLUSH, F_CPUID_DEFAULT),
+		F(CLFLUSH, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* Reserved */
 		VENDOR_F(DS),
 		/* ACPI */
-		F(MMX, F_CPUID_DEFAULT),
-		F(FXSR, F_CPUID_DEFAULT),
-		F(XMM, F_CPUID_DEFAULT),
-		F(XMM2, F_CPUID_DEFAULT),
-		F(SELFSNOOP, F_CPUID_DEFAULT),
+		F(MMX, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(FXSR, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(XMM, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(XMM2, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(SELFSNOOP, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* HTT, TM, Reserved, PBE */
 	);
 
 	kvm_cpu_cap_init(CPUID_7_0_EBX,
-		F(FSGSBASE, F_CPUID_DEFAULT),
-		EMULATED_F(TSC_ADJUST, F_CPUID_DEFAULT),
+		F(FSGSBASE, F_CPUID_DEFAULT | F_CPUID_TDX),
+		EMULATED_F(TSC_ADJUST, F_CPUID_DEFAULT | F_CPUID_TDX),
 		F(SGX, F_CPUID_DEFAULT),
-		F(BMI1, F_CPUID_DEFAULT),
+		F(BMI1, F_CPUID_DEFAULT | F_CPUID_TDX),
+		/* KVM doesn't support HLE for TDX yet */
 		F(HLE, F_CPUID_DEFAULT),
-		F(AVX2, F_CPUID_DEFAULT),
-		F(FDP_EXCPTN_ONLY, F_CPUID_DEFAULT),
-		F(SMEP, F_CPUID_DEFAULT),
-		F(BMI2, F_CPUID_DEFAULT),
-		F(ERMS, F_CPUID_DEFAULT),
-		F(INVPCID, F_CPUID_DEFAULT),
+		F(AVX2, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(FDP_EXCPTN_ONLY, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(SMEP, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(BMI2, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(ERMS, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(INVPCID, F_CPUID_DEFAULT | F_CPUID_TDX),
+		/* KVM doesn't support RTM for TDX yet */
 		F(RTM, F_CPUID_DEFAULT),
-		F(ZERO_FCS_FDS, F_CPUID_DEFAULT),
+		/* CQM */
+		F(ZERO_FCS_FDS, F_CPUID_DEFAULT | F_CPUID_TDX),
 		VENDOR_F(MPX),
-		F(AVX512F, F_CPUID_DEFAULT),
-		F(AVX512DQ, F_CPUID_DEFAULT),
-		F(RDSEED, F_CPUID_DEFAULT),
-		F(ADX, F_CPUID_DEFAULT),
-		F(SMAP, F_CPUID_DEFAULT),
-		F(AVX512IFMA, F_CPUID_DEFAULT),
-		F(CLFLUSHOPT, F_CPUID_DEFAULT),
-		F(CLWB, F_CPUID_DEFAULT),
+		/* RDT_A */
+		F(AVX512F, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX512DQ, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(RDSEED, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(ADX, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(SMAP, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX512IFMA, F_CPUID_DEFAULT | F_CPUID_TDX),
+		/* Reserved */
+		F(CLFLUSHOPT, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(CLWB, F_CPUID_DEFAULT | F_CPUID_TDX),
 		VENDOR_F(INTEL_PT),
-		F(AVX512PF, F_CPUID_DEFAULT),
-		F(AVX512ER, F_CPUID_DEFAULT),
-		F(AVX512CD, F_CPUID_DEFAULT),
-		F(SHA_NI, F_CPUID_DEFAULT),
-		F(AVX512BW, F_CPUID_DEFAULT),
-		F(AVX512VL, F_CPUID_DEFAULT),
+		F(AVX512PF, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX512ER, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX512CD, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(SHA_NI, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX512BW, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX512VL, F_CPUID_DEFAULT | F_CPUID_TDX),
 	);
 
 	kvm_cpu_cap_init(CPUID_7_ECX,
 		/* PREFETCHWT1 */
-		F(AVX512VBMI, F_CPUID_DEFAULT),
-		F(UMIP, F_CPUID_DEFAULT),
-		F(PKU, F_CPUID_DEFAULT),
+		F(AVX512VBMI, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(UMIP, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(PKU, F_CPUID_DEFAULT | F_CPUID_TDX),
 		RUNTIME_F(OSPKE),
 		VENDOR_F(WAITPKG),
-		F(AVX512_VBMI2, F_CPUID_DEFAULT),
-		X86_64_F(SHSTK, F_CPUID_DEFAULT),
-		F(GFNI, F_CPUID_DEFAULT),
-		F(VAES, F_CPUID_DEFAULT),
-		F(VPCLMULQDQ, F_CPUID_DEFAULT),
-		F(AVX512_VNNI, F_CPUID_DEFAULT),
-		F(AVX512_BITALG, F_CPUID_DEFAULT),
+		F(AVX512_VBMI2, F_CPUID_DEFAULT | F_CPUID_TDX),
+		X86_64_F(SHSTK, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(GFNI, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(VAES, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(VPCLMULQDQ, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX512_VNNI, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX512_BITALG, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* TME */
-		F(AVX512_VPOPCNTDQ, F_CPUID_DEFAULT),
+		F(AVX512_VPOPCNTDQ, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* Reserved */
-		PASSTHROUGH_F(LA57, F_CPUID_DEFAULT),
+		PASSTHROUGH_F(LA57, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* MPX_MAWAU */
-		F(RDPID, F_CPUID_DEFAULT),
+		F(RDPID, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* KEY_LOCKER */
-		F(BUS_LOCK_DETECT, F_CPUID_DEFAULT),
-		F(CLDEMOTE, F_CPUID_DEFAULT),
+		F(BUS_LOCK_DETECT, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(CLDEMOTE, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* Reserved */
-		F(MOVDIRI, F_CPUID_DEFAULT),
-		F(MOVDIR64B, F_CPUID_DEFAULT),
+		F(MOVDIRI, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(MOVDIR64B, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* ENQCMD */
 		F(SGX_LC, F_CPUID_DEFAULT),
 		/* PKS */
@@ -1000,34 +1007,34 @@ void kvm_initialize_cpu_caps(void)
 	 * doesn't know how to emulate or map.
 	 */
 	if (!tdp_enabled)
-		kvm_cpu_cap_clear(X86_FEATURE_SHSTK, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK, F_CPUID_DEFAULT | F_CPUID_TDX);
 
 	kvm_cpu_cap_init(CPUID_7_EDX,
 		/* Reserved, SGX_KEYS */
-		F(AVX512_4VNNIW, F_CPUID_DEFAULT),
-		F(AVX512_4FMAPS, F_CPUID_DEFAULT),
-		F(FSRM, F_CPUID_DEFAULT),
+		F(AVX512_4VNNIW, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX512_4FMAPS, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(FSRM, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* UINT, Reserved, Reserved */
-		F(AVX512_VP2INTERSECT, F_CPUID_DEFAULT),
+		F(AVX512_VP2INTERSECT, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* SRBDS_CTRL */
-		F(MD_CLEAR, F_CPUID_DEFAULT),
+		F(MD_CLEAR, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* RTM_ALWAYS_ABORT, Reserved, TSX_FORCE_ABORT */
-		F(SERIALIZE, F_CPUID_DEFAULT),
+		F(SERIALIZE, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* HYBRID_CPU */
-		F(TSXLDTRK, F_CPUID_DEFAULT),
+		F(TSXLDTRK, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* Reserved, PCONFIG, ARCH_LBR */
-		F(IBT, F_CPUID_DEFAULT),
+		F(IBT, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* Reserved */
-		F(AMX_BF16, F_CPUID_DEFAULT),
-		F(AVX512_FP16, F_CPUID_DEFAULT),
-		F(AMX_TILE, F_CPUID_DEFAULT),
-		F(AMX_INT8, F_CPUID_DEFAULT),
-		F(SPEC_CTRL, F_CPUID_DEFAULT),
-		F(INTEL_STIBP, F_CPUID_DEFAULT),
-		F(FLUSH_L1D, F_CPUID_DEFAULT),
-		EMULATED_F(ARCH_CAPABILITIES, F_CPUID_DEFAULT),
+		F(AMX_BF16, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX512_FP16, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AMX_TILE, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AMX_INT8, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(SPEC_CTRL, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(INTEL_STIBP, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(FLUSH_L1D, F_CPUID_DEFAULT | F_CPUID_TDX),
+		EMULATED_F(ARCH_CAPABILITIES, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* CORE_CAPABILITIES */
-		F(SPEC_CTRL_SSBD, F_CPUID_DEFAULT),
+		F(SPEC_CTRL_SSBD, F_CPUID_DEFAULT | F_CPUID_TDX),
 	);
 
 	/*
@@ -1050,53 +1057,55 @@ void kvm_initialize_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD, F_CPUID_DEFAULT);
 
 	kvm_cpu_cap_init(CPUID_7_1_EAX,
-		F(SHA512, F_CPUID_DEFAULT),
-		F(SM3, F_CPUID_DEFAULT),
-		F(SM4, F_CPUID_DEFAULT),
-		F(AVX_VNNI, F_CPUID_DEFAULT),
-		F(AVX512_BF16, F_CPUID_DEFAULT),
-		F(CMPCCXADD, F_CPUID_DEFAULT),
-		F(FZRM, F_CPUID_DEFAULT),
-		F(FSRS, F_CPUID_DEFAULT),
-		F(FSRC, F_CPUID_DEFAULT),
-		X86_64_F(LKGS, F_CPUID_DEFAULT),
-		F(WRMSRNS, F_CPUID_DEFAULT),
-		F(AMX_FP16, F_CPUID_DEFAULT),
-		F(AVX_IFMA, F_CPUID_DEFAULT),
-		F(LAM, F_CPUID_DEFAULT),
-		F(MOVRS, F_CPUID_DEFAULT),
+		F(SHA512, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(SM3, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(SM4, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX_VNNI, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX512_BF16, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(CMPCCXADD, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(FZRM, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(FSRS, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(FSRC, F_CPUID_DEFAULT | F_CPUID_TDX),
+		X86_64_F(LKGS, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(WRMSRNS, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AMX_FP16, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX_IFMA, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(LAM, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(MOVRS, F_CPUID_DEFAULT | F_CPUID_TDX),
 	);
 
 	kvm_cpu_cap_init(CPUID_7_1_ECX,
+		/* MSR_IMM is reserved in TDX spec */
 		SCATTERED_F(MSR_IMM, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_7_1_EDX,
-		F(AVX_VNNI_INT8, F_CPUID_DEFAULT),
-		F(AVX_NE_CONVERT, F_CPUID_DEFAULT),
-		F(AMX_COMPLEX, F_CPUID_DEFAULT),
-		F(AVX_VNNI_INT16, F_CPUID_DEFAULT),
-		F(PREFETCHITI, F_CPUID_DEFAULT),
-		F(AVX10, F_CPUID_DEFAULT),
+		F(AVX_VNNI_INT8, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX_NE_CONVERT, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AMX_COMPLEX, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX_VNNI_INT16, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(PREFETCHITI, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX10, F_CPUID_DEFAULT | F_CPUID_TDX),
 	);
 
 	kvm_cpu_cap_init(CPUID_7_2_EDX,
-		F(INTEL_PSFD, F_CPUID_DEFAULT),
-		F(IPRED_CTRL, F_CPUID_DEFAULT),
-		F(RRSBA_CTRL, F_CPUID_DEFAULT),
-		F(DDPD_U, F_CPUID_DEFAULT),
-		F(BHI_CTRL, F_CPUID_DEFAULT),
-		F(MCDT_NO, F_CPUID_DEFAULT),
+		F(INTEL_PSFD, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(IPRED_CTRL, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(RRSBA_CTRL, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(DDPD_U, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(BHI_CTRL, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(MCDT_NO, F_CPUID_DEFAULT | F_CPUID_TDX),
 	);
 
 	kvm_cpu_cap_init(CPUID_D_1_EAX,
-		F(XSAVEOPT, F_CPUID_DEFAULT),
-		F(XSAVEC, F_CPUID_DEFAULT),
-		F(XGETBV1, F_CPUID_DEFAULT),
-		F(XSAVES, F_CPUID_DEFAULT),
-		X86_64_F(XFD, F_CPUID_DEFAULT),
+		F(XSAVEOPT, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(XSAVEC, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(XGETBV1, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(XSAVES, F_CPUID_DEFAULT | F_CPUID_TDX),
+		X86_64_F(XFD, F_CPUID_DEFAULT | F_CPUID_TDX),
 	);
 
+	/* SGX related features are fixed-0 for TDX */
 	kvm_cpu_cap_init(CPUID_12_EAX,
 		SCATTERED_F(SGX1, F_CPUID_DEFAULT),
 		SCATTERED_F(SGX2, F_CPUID_DEFAULT),
@@ -1104,36 +1113,37 @@ void kvm_initialize_cpu_caps(void)
 	);
 
 	kvm_cpu_cap_init(CPUID_1E_1_EAX,
-		F(AMX_INT8_ALIAS, F_CPUID_DEFAULT),
-		F(AMX_BF16_ALIAS, F_CPUID_DEFAULT),
-		F(AMX_COMPLEX_ALIAS, F_CPUID_DEFAULT),
-		F(AMX_FP16_ALIAS, F_CPUID_DEFAULT),
-		F(AMX_FP8, F_CPUID_DEFAULT),
-		F(AMX_TF32, F_CPUID_DEFAULT),
-		F(AMX_AVX512, F_CPUID_DEFAULT),
-		F(AMX_MOVRS, F_CPUID_DEFAULT),
+		F(AMX_INT8_ALIAS, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AMX_BF16_ALIAS, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AMX_COMPLEX_ALIAS, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AMX_FP16_ALIAS, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AMX_FP8, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AMX_TF32, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AMX_AVX512, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AMX_MOVRS, F_CPUID_DEFAULT | F_CPUID_TDX),
 	);
 
 	kvm_cpu_cap_init(CPUID_24_0_EBX,
-		F(AVX10_128, F_CPUID_DEFAULT),
-		F(AVX10_256, F_CPUID_DEFAULT),
-		F(AVX10_512, F_CPUID_DEFAULT),
+		F(AVX10_128, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX10_256, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(AVX10_512, F_CPUID_DEFAULT | F_CPUID_TDX),
 	);
 
 	kvm_cpu_cap_init(CPUID_24_1_ECX,
+		/* AVX10_VNNI_INT is reserved in TDX spec */
 		F(AVX10_VNNI_INT, F_CPUID_DEFAULT),
 	);
 
 	kvm_cpu_cap_init(CPUID_8000_0001_ECX,
-		F(LAHF_LM, F_CPUID_DEFAULT),
+		F(LAHF_LM, F_CPUID_DEFAULT | F_CPUID_TDX),
 		F(CMP_LEGACY, F_CPUID_DEFAULT),
 		VENDOR_F(SVM),
 		/* ExtApicSpace */
 		F(CR8_LEGACY, F_CPUID_DEFAULT),
-		F(ABM, F_CPUID_DEFAULT),
+		F(ABM, F_CPUID_DEFAULT | F_CPUID_TDX),
 		F(SSE4A, F_CPUID_DEFAULT),
 		F(MISALIGNSSE, F_CPUID_DEFAULT),
-		F(3DNOWPREFETCH, F_CPUID_DEFAULT),
+		F(3DNOWPREFETCH, F_CPUID_DEFAULT | F_CPUID_TDX),
 		F(OSVW, F_CPUID_DEFAULT),
 		/* IBS */
 		F(XOP, F_CPUID_DEFAULT),
@@ -1156,7 +1166,7 @@ void kvm_initialize_cpu_caps(void)
 		ALIASED_1_EDX_F(CX8),
 		ALIASED_1_EDX_F(APIC),
 		/* Reserved */
-		F(SYSCALL, F_CPUID_DEFAULT),
+		F(SYSCALL, F_CPUID_DEFAULT | F_CPUID_TDX),
 		ALIASED_1_EDX_F(MTRR),
 		ALIASED_1_EDX_F(PGE),
 		ALIASED_1_EDX_F(MCA),
@@ -1164,16 +1174,16 @@ void kvm_initialize_cpu_caps(void)
 		ALIASED_1_EDX_F(PAT),
 		ALIASED_1_EDX_F(PSE36),
 		/* Reserved */
-		F(NX, F_CPUID_DEFAULT),
+		F(NX, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* Reserved */
 		F(MMXEXT, F_CPUID_DEFAULT),
 		ALIASED_1_EDX_F(MMX),
 		ALIASED_1_EDX_F(FXSR),
 		F(FXSR_OPT, F_CPUID_DEFAULT),
-		X86_64_F(GBPAGES, F_CPUID_DEFAULT),
-		F(RDTSCP, F_CPUID_DEFAULT),
+		X86_64_F(GBPAGES, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(RDTSCP, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* Reserved */
-		X86_64_F(LM, F_CPUID_DEFAULT),
+		X86_64_F(LM, F_CPUID_DEFAULT | F_CPUID_TDX),
 		F(3DNOWEXT, F_CPUID_DEFAULT),
 		F(3DNOW, F_CPUID_DEFAULT),
 	);
@@ -1182,13 +1192,13 @@ void kvm_initialize_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_GBPAGES, F_CPUID_DEFAULT);
 
 	kvm_cpu_cap_init(CPUID_8000_0007_EDX,
-		SCATTERED_F(CONSTANT_TSC, F_CPUID_DEFAULT),
+		SCATTERED_F(CONSTANT_TSC, F_CPUID_DEFAULT | F_CPUID_TDX),
 	);
 
 	kvm_cpu_cap_init(CPUID_8000_0008_EBX,
 		F(CLZERO, F_CPUID_DEFAULT),
 		F(XSAVEERPTR, F_CPUID_DEFAULT),
-		F(WBNOINVD, F_CPUID_DEFAULT),
+		F(WBNOINVD, F_CPUID_DEFAULT | F_CPUID_TDX),
 		F(AMD_IBPB, F_CPUID_DEFAULT),
 		F(AMD_IBRS, F_CPUID_DEFAULT),
 		F(AMD_SSBD, F_CPUID_DEFAULT),
@@ -1318,6 +1328,8 @@ void kvm_initialize_cpu_caps(void)
 	 * RDPID is misreported, and KVM has botched MSR_TSC_AUX emulation in
 	 * the past.  For example, the sanity check may fire if this instance of
 	 * KVM is running as L1 on top of an older, broken KVM.
+	 *
+	 * If MSR_TSC_AUX probing failed, TDX will be disabled.
 	 */
 	if (WARN_ON((kvm_cpu_cap_has(X86_FEATURE_RDTSCP) ||
 		     kvm_cpu_cap_has(X86_FEATURE_RDPID)) &&
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7879a8a532c4..fae6b33949f5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8079,6 +8079,8 @@ static __init u64 vmx_get_perf_capabilities(void)
 
 static __init void vmx_set_cpu_caps(void)
 {
+	u32 enable_mask;
+
 	kvm_initialize_cpu_caps();
 
 	/* CPUID 0x1 */
@@ -8086,21 +8088,27 @@ static __init void vmx_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_VMX, F_CPUID_DEFAULT);
 
 	/* CPUID 0x7 */
+	/* MPX is fixed-0 for TDX */
 	if (kvm_mpx_supported())
 		kvm_cpu_cap_check_and_set(X86_FEATURE_MPX, F_CPUID_DEFAULT);
+	/* INVPCID is fixed-1 for TDX */
 	if (!cpu_has_vmx_invpcid())
 		kvm_cpu_cap_clear(X86_FEATURE_INVPCID, F_CPUID_DEFAULT);
+	/* KVM doesn't support PT for TDX yet */
 	if (vmx_pt_mode_is_host_guest())
 		kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT, F_CPUID_DEFAULT);
-	if (vmx_pebs_supported()) {
-		kvm_cpu_cap_check_and_set(X86_FEATURE_DS, F_CPUID_DEFAULT);
-		kvm_cpu_cap_check_and_set(X86_FEATURE_DTES64, F_CPUID_DEFAULT);
-	}
 
+	/* DS and DTES64 are fixed-1 for TDX */
+	enable_mask = vmx_pebs_supported() ? F_CPUID_TDX | F_CPUID_DEFAULT : F_CPUID_TDX;
+	kvm_cpu_cap_check_and_set(X86_FEATURE_DS, enable_mask);
+	kvm_cpu_cap_check_and_set(X86_FEATURE_DTES64, enable_mask);
+
+	/* PDCM is fixed-1 for TDX */
 	if (!enable_pmu)
 		kvm_cpu_cap_clear(X86_FEATURE_PDCM, F_CPUID_DEFAULT);
 	kvm_caps.supported_perf_cap = vmx_get_perf_capabilities();
 
+	/* SGX related features are fixed-0 for TDX */
 	if (!enable_sgx) {
 		kvm_cpu_cap_clear(X86_FEATURE_SGX, F_CPUID_DEFAULT);
 		kvm_cpu_cap_clear(X86_FEATURE_SGX_LC, F_CPUID_DEFAULT);
@@ -8113,6 +8121,7 @@ static __init void vmx_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_UMIP, F_CPUID_DEFAULT);
 
 	/* CPUID 0xD.1 */
+	/* XSAVES is fixed-1 for TDX */
 	if (!cpu_has_vmx_xsaves())
 		kvm_cpu_cap_clear(X86_FEATURE_XSAVES, F_CPUID_DEFAULT);
 
@@ -8122,6 +8131,7 @@ static __init void vmx_set_cpu_caps(void)
 		kvm_cpu_cap_clear(X86_FEATURE_RDPID, F_CPUID_DEFAULT);
 	}
 
+	/* KVM doesn't support WAITPKG for TDX yet */
 	if (cpu_has_vmx_waitpkg())
 		kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG, F_CPUID_DEFAULT);
 
@@ -8133,8 +8143,8 @@ static __init void vmx_set_cpu_caps(void)
 	 */
 	if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest ||
 	    !cpu_has_vmx_basic_no_hw_errcode_cc()) {
-		kvm_cpu_cap_clear(X86_FEATURE_SHSTK, F_CPUID_DEFAULT);
-		kvm_cpu_cap_clear(X86_FEATURE_IBT, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK, F_CPUID_DEFAULT | F_CPUID_TDX);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT, F_CPUID_DEFAULT | F_CPUID_TDX);
 	}
 
 	kvm_setup_xss_caps();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5b830997e693..db8434f9a2ee 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10024,8 +10024,8 @@ void kvm_setup_xss_caps(void)
 		kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_ALL;
 
 	if ((kvm_caps.supported_xss & XFEATURE_MASK_CET_ALL) != XFEATURE_MASK_CET_ALL) {
-		kvm_cpu_cap_clear(X86_FEATURE_SHSTK, F_CPUID_DEFAULT);
-		kvm_cpu_cap_clear(X86_FEATURE_IBT, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK, F_CPUID_DEFAULT | F_CPUID_TDX);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT, F_CPUID_DEFAULT | F_CPUID_TDX);
 		kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_ALL;
 	}
 }
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 07/27] KVM: x86: Support KVM_GET_{SUPPORTED,EMULATED}_CPUID as VM scope ioctls
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (5 preceding siblings ...)
  2026-04-17  7:35 ` [RFC PATCH 06/27] KVM: x86: Populate TDX CPUID overlay with supported feature bits Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 08/27] KVM: x86: Thread @kvm to KVM CPU capability helpers Binbin Wu
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Handle KVM_GET_{SUPPORTED,EMULATED}_CPUID in kvm_arch_vm_ioctl()
so that userspace can query supported CPUID on a VM fd.

When issued on a VM fd, KVM can return CPUID data tailored to the VM's
type (e.g. using the TDX overlay for TDX VMs instead of the default
VMX overlay).  @kvm is not yet used to select the overlay; a follow-on
patch will wire that up.  The dev-ioctl path continues to work by
passing NULL for @kvm.

Extract the copy_from_user/copy_to_user boilerplate into a shared
helper, kvm_get_cpuid(), used by both the dev-ioctl and VM-ioctl
paths.

There will a be new capability added for CPUID paranoid mode in a later
patch, reuse that capability to tell userspace that the VM versions of
KVM_GET_{SUPPORTED,EMULATED}_CPUID are supported.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c |  7 ++++---
 arch/x86/kvm/cpuid.h |  6 +++---
 arch/x86/kvm/x86.c   | 42 ++++++++++++++++++++++++++----------------
 3 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 938b19767feb..9634ea01d2a3 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -2013,9 +2013,10 @@ static bool sanity_check_entries(struct kvm_cpuid_entry2 __user *entries,
 	return false;
 }
 
-int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
-			    struct kvm_cpuid_entry2 __user *entries,
-			    unsigned int type)
+/* The input kvm could be NULL, check it before using. */
+int kvm_vm_ioctl_get_cpuid(struct kvm *kvm, struct kvm_cpuid2 *cpuid,
+			   struct kvm_cpuid_entry2 __user *entries,
+			   unsigned int type)
 {
 	static const u32 funcs[] = {
 		0, 0x80000000, CENTAUR_CPUID_SIGNATURE, KVM_CPUID_SIGNATURE,
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 4b1274f055e5..0afde541b036 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -74,9 +74,9 @@ static inline struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcp
 				     function, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
 }
 
-int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
-			    struct kvm_cpuid_entry2 __user *entries,
-			    unsigned int type);
+int kvm_vm_ioctl_get_cpuid(struct kvm *kvm, struct kvm_cpuid2 *cpuid,
+			   struct kvm_cpuid_entry2 __user *entries,
+			   unsigned int type);
 int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
 			     struct kvm_cpuid *cpuid,
 			     struct kvm_cpuid_entry __user *entries);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index db8434f9a2ee..525fcb09a4c0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5041,6 +5041,26 @@ static int kvm_x86_dev_has_attr(struct kvm_device_attr *attr)
 	return __kvm_x86_dev_get_attr(attr, &val);
 }
 
+static int kvm_x86_vm_get_cpuid(struct kvm *kvm, void __user *argp, unsigned int ioctl)
+{
+	struct kvm_cpuid2 __user *cpuid_arg = argp;
+	struct kvm_cpuid2 cpuid;
+	int r = -EFAULT;
+
+	if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
+		return r;
+
+	r = kvm_vm_ioctl_get_cpuid(kvm, &cpuid, cpuid_arg->entries, ioctl);
+	if (r)
+		return r;
+
+	r = -EFAULT;
+	if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
+		return r;
+
+	return 0;
+}
+
 long kvm_arch_dev_ioctl(struct file *filp,
 			unsigned int ioctl, unsigned long arg)
 {
@@ -5076,22 +5096,7 @@ long kvm_arch_dev_ioctl(struct file *filp,
 	}
 	case KVM_GET_SUPPORTED_CPUID:
 	case KVM_GET_EMULATED_CPUID: {
-		struct kvm_cpuid2 __user *cpuid_arg = argp;
-		struct kvm_cpuid2 cpuid;
-
-		r = -EFAULT;
-		if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
-			goto out;
-
-		r = kvm_dev_ioctl_get_cpuid(&cpuid, cpuid_arg->entries,
-					    ioctl);
-		if (r)
-			goto out;
-
-		r = -EFAULT;
-		if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
-			goto out;
-		r = 0;
+		r = kvm_x86_vm_get_cpuid(NULL, argp, ioctl);
 		break;
 	}
 	case KVM_X86_GET_MCE_CAP_SUPPORTED:
@@ -7628,6 +7633,11 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
 		r = kvm_vm_ioctl_set_msr_filter(kvm, &filter);
 		break;
 	}
+	case KVM_GET_SUPPORTED_CPUID:
+	case KVM_GET_EMULATED_CPUID: {
+		r = kvm_x86_vm_get_cpuid(kvm, argp, ioctl);
+		break;
+	}
 	default:
 		r = -ENOTTY;
 	}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 08/27] KVM: x86: Thread @kvm to KVM CPU capability helpers
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (6 preceding siblings ...)
  2026-04-17  7:35 ` [RFC PATCH 07/27] KVM: x86: Support KVM_GET_{SUPPORTED,EMULATED}_CPUID as VM scope ioctls Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 09/27] KVM: x86: Use overlays of KVM CPU capabilities Binbin Wu
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Thread @kvm through kvm_cpu_cap_has(), kvm_cpu_cap_get(),
cpuid_entry_override(), cpuid_func_emulated(), __do_cpuid_func(), and
their callers, to prepare for allowing KVM to select the appropriate
CPUID overlay based on the VM type and hardware vendor.

Remove the __kvm_cpu_cap_has() wrapper macro, as kvm_cpu_cap_has() now
takes @kvm directly.

No functional change intended.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c      | 112 +++++++++++++++++++-------------------
 arch/x86/kvm/cpuid.h      |   9 +--
 arch/x86/kvm/svm/nested.c |   4 +-
 arch/x86/kvm/svm/svm.c    |   8 +--
 arch/x86/kvm/vmx/hyperv.c |   2 +-
 arch/x86/kvm/vmx/nested.c |   8 +--
 arch/x86/kvm/vmx/vmx.c    |  13 +++--
 arch/x86/kvm/x86.c        |  38 ++++++-------
 arch/x86/kvm/x86.h        |   2 +-
 9 files changed, 98 insertions(+), 98 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 9634ea01d2a3..20ea483ddc7a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -369,8 +369,8 @@ static u32 cpuid_get_reg_unsafe(struct kvm_cpuid_entry2 *entry, u32 reg)
 	}
 }
 
-static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func,
-			       bool include_partially_emulated);
+static int cpuid_func_emulated(struct kvm *kvm, struct kvm_cpuid_entry2 *entry,
+			       u32 func, bool include_partially_emulated);
 
 void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
@@ -406,7 +406,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		 */
 		vcpu->arch.cpu_caps[i] = kvm_cpu_caps[CPUID_OL_DEFAULT][i];
 		if (!cpuid.index) {
-			cpuid_func_emulated(&emulated, cpuid.function, true);
+			cpuid_func_emulated(vcpu->kvm, &emulated, cpuid.function, true);
 			vcpu->arch.cpu_caps[i] |= cpuid_get_reg_unsafe(&emulated, cpuid.reg);
 		}
 		vcpu->arch.cpu_caps[i] &= cpuid_get_reg_unsafe(entry, cpuid.reg);
@@ -450,10 +450,8 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	kvm_pmu_refresh(vcpu);
 
-#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
-	vcpu->arch.cr4_guest_rsvd_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_) |
+	vcpu->arch.cr4_guest_rsvd_bits = __cr4_reserved_bits(kvm_cpu_cap_has, vcpu->kvm) |
 					 __cr4_reserved_bits(guest_cpu_cap_has, vcpu);
-#undef __kvm_cpu_cap_has
 
 	kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu));
 
@@ -1331,8 +1329,8 @@ void kvm_initialize_cpu_caps(void)
 	 *
 	 * If MSR_TSC_AUX probing failed, TDX will be disabled.
 	 */
-	if (WARN_ON((kvm_cpu_cap_has(X86_FEATURE_RDTSCP) ||
-		     kvm_cpu_cap_has(X86_FEATURE_RDPID)) &&
+	if (WARN_ON((kvm_cpu_cap_has(NULL, X86_FEATURE_RDTSCP) ||
+		     kvm_cpu_cap_has(NULL, X86_FEATURE_RDPID)) &&
 		     !kvm_is_supported_user_return_msr(MSR_TSC_AUX))) {
 		kvm_cpu_cap_clear(X86_FEATURE_RDTSCP, F_CPUID_DEFAULT);
 		kvm_cpu_cap_clear(X86_FEATURE_RDPID, F_CPUID_DEFAULT);
@@ -1407,8 +1405,8 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
 	return entry;
 }
 
-static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func,
-			       bool include_partially_emulated)
+static int cpuid_func_emulated(struct kvm *kvm, struct kvm_cpuid_entry2 *entry,
+			       u32 func, bool include_partially_emulated)
 {
 	memset(entry, 0, sizeof(*entry));
 
@@ -1436,7 +1434,7 @@ static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func,
 	case 7:
 		entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
 		entry->eax = 0;
-		if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
+		if (kvm_cpu_cap_has(kvm, X86_FEATURE_RDTSCP))
 			entry->ecx = feature_bit(RDPID);
 		return 1;
 	default:
@@ -1444,16 +1442,16 @@ static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func,
 	}
 }
 
-static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
+static int __do_cpuid_func_emulated(struct kvm *kvm, struct kvm_cpuid_array *array, u32 func)
 {
 	if (array->nent >= array->maxnent)
 		return -E2BIG;
 
-	array->nent += cpuid_func_emulated(&array->entries[array->nent], func, false);
+	array->nent += cpuid_func_emulated(kvm, &array->entries[array->nent], func, false);
 	return 0;
 }
 
-static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
+static inline int __do_cpuid_func(struct kvm *kvm, struct kvm_cpuid_array *array, u32 function)
 {
 	struct kvm_cpuid_entry2 *entry;
 	int r, i, max_idx;
@@ -1473,8 +1471,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->eax = min(entry->eax, 0x24U);
 		break;
 	case 1:
-		cpuid_entry_override(entry, CPUID_1_EDX);
-		cpuid_entry_override(entry, CPUID_1_ECX);
+		cpuid_entry_override(kvm, entry, CPUID_1_EDX);
+		cpuid_entry_override(kvm, entry, CPUID_1_ECX);
 		break;
 	case 2:
 		/*
@@ -1516,9 +1514,9 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 	/* function 7 has additional index. */
 	case 7:
 		max_idx = entry->eax = min(entry->eax, 2u);
-		cpuid_entry_override(entry, CPUID_7_0_EBX);
-		cpuid_entry_override(entry, CPUID_7_ECX);
-		cpuid_entry_override(entry, CPUID_7_EDX);
+		cpuid_entry_override(kvm, entry, CPUID_7_0_EBX);
+		cpuid_entry_override(kvm, entry, CPUID_7_ECX);
+		cpuid_entry_override(kvm, entry, CPUID_7_EDX);
 
 		/* KVM only supports up to 0x7.2, capped above via min(). */
 		if (max_idx >= 1) {
@@ -1526,9 +1524,9 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 			if (!entry)
 				goto out;
 
-			cpuid_entry_override(entry, CPUID_7_1_EAX);
-			cpuid_entry_override(entry, CPUID_7_1_ECX);
-			cpuid_entry_override(entry, CPUID_7_1_EDX);
+			cpuid_entry_override(kvm, entry, CPUID_7_1_EAX);
+			cpuid_entry_override(kvm, entry, CPUID_7_1_ECX);
+			cpuid_entry_override(kvm, entry, CPUID_7_1_EDX);
 			entry->ebx = 0;
 		}
 		if (max_idx >= 2) {
@@ -1536,7 +1534,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 			if (!entry)
 				goto out;
 
-			cpuid_entry_override(entry, CPUID_7_2_EDX);
+			cpuid_entry_override(kvm, entry, CPUID_7_2_EDX);
 			entry->ecx = 0;
 			entry->ebx = 0;
 			entry->eax = 0;
@@ -1590,7 +1588,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		if (!entry)
 			goto out;
 
-		cpuid_entry_override(entry, CPUID_D_1_EAX);
+		cpuid_entry_override(kvm, entry, CPUID_D_1_EAX);
 		if (entry->eax & (feature_bit(XSAVES) | feature_bit(XSAVEC)))
 			entry->ebx = xstate_required_size(permitted_xcr0 | permitted_xss,
 							  true);
@@ -1627,7 +1625,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 				continue;
 			}
 
-			if (!kvm_cpu_cap_has(X86_FEATURE_XFD))
+			if (!kvm_cpu_cap_has(kvm, X86_FEATURE_XFD))
 				entry->ecx &= ~BIT_ULL(2);
 			entry->edx = 0;
 		}
@@ -1635,7 +1633,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 	}
 	case 0x12:
 		/* Intel SGX */
-		if (!kvm_cpu_cap_has(X86_FEATURE_SGX)) {
+		if (!kvm_cpu_cap_has(kvm, X86_FEATURE_SGX)) {
 			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
 			break;
 		}
@@ -1646,7 +1644,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		 * are restricted by kernel and KVM capabilities (like most
 		 * feature flags), while enclave size is unrestricted.
 		 */
-		cpuid_entry_override(entry, CPUID_12_EAX);
+		cpuid_entry_override(kvm, entry, CPUID_12_EAX);
 		entry->ebx &= SGX_MISC_EXINFO;
 
 		entry = do_host_cpuid(array, function, 1);
@@ -1665,7 +1663,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		break;
 	/* Intel PT */
 	case 0x14:
-		if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT)) {
+		if (!kvm_cpu_cap_has(kvm, X86_FEATURE_INTEL_PT)) {
 			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
 			break;
 		}
@@ -1677,7 +1675,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		break;
 	/* Intel AMX TILE */
 	case 0x1d:
-		if (!kvm_cpu_cap_has(X86_FEATURE_AMX_TILE)) {
+		if (!kvm_cpu_cap_has(kvm, X86_FEATURE_AMX_TILE)) {
 			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
 			break;
 		}
@@ -1688,7 +1686,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		}
 		break;
 	case 0x1e: /* TMUL information */
-		if (!kvm_cpu_cap_has(X86_FEATURE_AMX_TILE)) {
+		if (!kvm_cpu_cap_has(kvm, X86_FEATURE_AMX_TILE)) {
 			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
 			break;
 		}
@@ -1701,7 +1699,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 			if (!entry)
 				goto out;
 
-			cpuid_entry_override(entry, CPUID_1E_1_EAX);
+			cpuid_entry_override(kvm, entry, CPUID_1E_1_EAX);
 			entry->ebx = 0;
 			entry->ecx = 0;
 			entry->edx = 0;
@@ -1710,7 +1708,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 	case 0x24: {
 		u8 avx10_version;
 
-		if (!kvm_cpu_cap_has(X86_FEATURE_AVX10)) {
+		if (!kvm_cpu_cap_has(kvm, X86_FEATURE_AVX10)) {
 			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
 			break;
 		}
@@ -1722,7 +1720,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		 * version needs to be captured before overriding EBX features!
 		 */
 		avx10_version = min_t(u8, entry->ebx & 0xff, 2);
-		cpuid_entry_override(entry, CPUID_24_0_EBX);
+		cpuid_entry_override(kvm, entry, CPUID_24_0_EBX);
 		entry->ebx |= avx10_version;
 
 		entry->ecx = 0;
@@ -1734,7 +1732,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 			if (!entry)
 				goto out;
 
-			cpuid_entry_override(entry, CPUID_24_1_ECX);
+			cpuid_entry_override(kvm, entry, CPUID_24_1_ECX);
 			entry->eax = 0;
 			entry->ebx = 0;
 			entry->edx = 0;
@@ -1793,8 +1791,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		break;
 	case 0x80000001:
 		entry->ebx &= ~GENMASK(27, 16);
-		cpuid_entry_override(entry, CPUID_8000_0001_EDX);
-		cpuid_entry_override(entry, CPUID_8000_0001_ECX);
+		cpuid_entry_override(kvm, entry, CPUID_8000_0001_EDX);
+		cpuid_entry_override(kvm, entry, CPUID_8000_0001_ECX);
 		break;
 	case 0x80000005:
 		/*  Pass host L1 cache and TLB info. */
@@ -1804,7 +1802,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->edx &= ~GENMASK(17, 16);
 		break;
 	case 0x80000007: /* Advanced power management */
-		cpuid_entry_override(entry, CPUID_8000_0007_EDX);
+		cpuid_entry_override(kvm, entry, CPUID_8000_0007_EDX);
 
 		/* mask against host */
 		entry->edx &= boot_cpu_data.x86_power;
@@ -1854,11 +1852,11 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->eax = phys_as | (virt_as << 8) | (g_phys_as << 16);
 		entry->ecx &= ~(GENMASK(31, 16) | GENMASK(11, 8));
 		entry->edx = 0;
-		cpuid_entry_override(entry, CPUID_8000_0008_EBX);
+		cpuid_entry_override(kvm, entry, CPUID_8000_0008_EBX);
 		break;
 	}
 	case 0x8000000A:
-		if (!kvm_cpu_cap_has(X86_FEATURE_SVM)) {
+		if (!kvm_cpu_cap_has(kvm, X86_FEATURE_SVM)) {
 			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
 			break;
 		}
@@ -1866,7 +1864,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->ebx = 8; /* Lets support 8 ASIDs in case we add proper
 				   ASID emulation to nested SVM */
 		entry->ecx = 0; /* Reserved */
-		cpuid_entry_override(entry, CPUID_8000_000A_EDX);
+		cpuid_entry_override(kvm, entry, CPUID_8000_000A_EDX);
 		break;
 	case 0x80000019:
 		entry->ecx = entry->edx = 0;
@@ -1881,10 +1879,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->edx = 0; /* reserved */
 		break;
 	case 0x8000001F:
-		if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
+		if (!kvm_cpu_cap_has(kvm, X86_FEATURE_SEV)) {
 			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
 		} else {
-			cpuid_entry_override(entry, CPUID_8000_001F_EAX);
+			cpuid_entry_override(kvm, entry, CPUID_8000_001F_EAX);
 			/* Clear NumVMPL since KVM does not support VMPL.  */
 			entry->ebx &= ~GENMASK(31, 12);
 			/*
@@ -1899,26 +1897,26 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		break;
 	case 0x80000021:
 		entry->edx = 0;
-		cpuid_entry_override(entry, CPUID_8000_0021_EAX);
+		cpuid_entry_override(kvm, entry, CPUID_8000_0021_EAX);
 
-		if (kvm_cpu_cap_has(X86_FEATURE_ERAPS))
+		if (kvm_cpu_cap_has(kvm, X86_FEATURE_ERAPS))
 			entry->ebx &= GENMASK(23, 16);
 		else
 			entry->ebx = 0;
 
-		cpuid_entry_override(entry, CPUID_8000_0021_ECX);
+		cpuid_entry_override(kvm, entry, CPUID_8000_0021_ECX);
 		break;
 	/* AMD Extended Performance Monitoring and Debug */
 	case 0x80000022: {
 		union cpuid_0x80000022_ebx ebx = { };
 
 		entry->ecx = entry->edx = 0;
-		if (!enable_pmu || !kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2)) {
+		if (!enable_pmu || !kvm_cpu_cap_has(kvm, X86_FEATURE_PERFMON_V2)) {
 			entry->eax = entry->ebx = 0;
 			break;
 		}
 
-		cpuid_entry_override(entry, CPUID_8000_0022_EAX);
+		cpuid_entry_override(kvm, entry, CPUID_8000_0022_EAX);
 
 		ebx.split.num_core_pmc = kvm_pmu_cap.num_counters_gp;
 		entry->ebx = ebx.full;
@@ -1930,7 +1928,7 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->eax = min(entry->eax, 0xC0000004);
 		break;
 	case 0xC0000001:
-		cpuid_entry_override(entry, CPUID_C000_0001_EDX);
+		cpuid_entry_override(kvm, entry, CPUID_C000_0001_EDX);
 		break;
 	case 3: /* Processor serial number */
 	case 5: /* MONITOR/MWAIT */
@@ -1950,19 +1948,19 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 	return r;
 }
 
-static int do_cpuid_func(struct kvm_cpuid_array *array, u32 func,
-			 unsigned int type)
+static int do_cpuid_func(struct kvm *kvm, struct kvm_cpuid_array *array,
+			 u32 func, unsigned int type)
 {
 	if (type == KVM_GET_EMULATED_CPUID)
-		return __do_cpuid_func_emulated(array, func);
+		return __do_cpuid_func_emulated(kvm, array, func);
 
-	return __do_cpuid_func(array, func);
+	return __do_cpuid_func(kvm, array, func);
 }
 
 #define CENTAUR_CPUID_SIGNATURE 0xC0000000
 
-static int get_cpuid_func(struct kvm_cpuid_array *array, u32 func,
-			  unsigned int type)
+static int get_cpuid_func(struct kvm *kvm, struct kvm_cpuid_array *array,
+			  u32 func, unsigned int type)
 {
 	u32 limit;
 	int r;
@@ -1972,13 +1970,13 @@ static int get_cpuid_func(struct kvm_cpuid_array *array, u32 func,
 	    boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN)
 		return 0;
 
-	r = do_cpuid_func(array, func, type);
+	r = do_cpuid_func(kvm, array, func, type);
 	if (r)
 		return r;
 
 	limit = array->entries[array->nent - 1].eax;
 	for (func = func + 1; func <= limit; ++func) {
-		r = do_cpuid_func(array, func, type);
+		r = do_cpuid_func(kvm, array, func, type);
 		if (r)
 			break;
 	}
@@ -2042,7 +2040,7 @@ int kvm_vm_ioctl_get_cpuid(struct kvm *kvm, struct kvm_cpuid2 *cpuid,
 	array.maxnent = cpuid->nent;
 
 	for (i = 0; i < ARRAY_SIZE(funcs); i++) {
-		r = get_cpuid_func(&array, funcs[i], type);
+		r = get_cpuid_func(kvm, &array, funcs[i], type);
 		if (r)
 			goto out_free;
 	}
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 0afde541b036..eae46f37d30f 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -117,7 +117,8 @@ static inline bool page_address_valid(struct kvm_vcpu *vcpu, gpa_t gpa)
 	return kvm_vcpu_is_legal_aligned_gpa(vcpu, gpa, PAGE_SIZE);
 }
 
-static __always_inline void cpuid_entry_override(struct kvm_cpuid_entry2 *entry,
+static __always_inline void cpuid_entry_override(struct kvm *kvm,
+						 struct kvm_cpuid_entry2 *entry,
 						 unsigned int leaf)
 {
 	u32 *reg = cpuid_entry_get_reg(entry, leaf * 32);
@@ -241,16 +242,16 @@ static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature, u32 overla
 	}
 }
 
-static __always_inline u32 kvm_cpu_cap_get(unsigned int x86_feature)
+static __always_inline u32 kvm_cpu_cap_get(struct kvm *kvm, unsigned int x86_feature)
 {
 	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
 	return kvm_cpu_caps[CPUID_OL_DEFAULT][x86_leaf] & __feature_bit(x86_feature);
 }
 
-static __always_inline bool kvm_cpu_cap_has(unsigned int x86_feature)
+static __always_inline bool kvm_cpu_cap_has(struct kvm *kvm, unsigned int x86_feature)
 {
-	return !!kvm_cpu_cap_get(x86_feature);
+	return !!kvm_cpu_cap_get(kvm, x86_feature);
 }
 
 static __always_inline void kvm_cpu_cap_check_and_set(unsigned int x86_feature, u32 overlay_mask)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 961804df5f45..4b8eb1ff3c1d 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1182,13 +1182,13 @@ void svm_copy_vmrun_state(struct vmcb_save_area *to_save,
 	to_save->rip = from_save->rip;
 	to_save->cpl = 0;
 
-	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+	if (kvm_cpu_cap_has(NULL, X86_FEATURE_SHSTK)) {
 		to_save->s_cet  = from_save->s_cet;
 		to_save->isst_addr = from_save->isst_addr;
 		to_save->ssp = from_save->ssp;
 	}
 
-	if (kvm_cpu_cap_has(X86_FEATURE_LBRV)) {
+	if (kvm_cpu_cap_has(NULL, X86_FEATURE_LBRV)) {
 		svm_copy_lbrs(to_save, from_save);
 		to_save->dbgctl &= ~DEBUGCTL_RESERVED_BITS;
 	}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7d1289f34f9f..2b4a17536580 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -833,7 +833,7 @@ static void svm_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 		svm_disable_intercept_for_msr(vcpu, MSR_IA32_MPERF, MSR_TYPE_R);
 	}
 
-	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+	if (kvm_cpu_cap_has(vcpu->kvm, X86_FEATURE_SHSTK)) {
 		bool shstk_enabled = guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
 
 		svm_set_intercept_for_msr(vcpu, MSR_IA32_U_CET, MSR_TYPE_RW, !shstk_enabled);
@@ -1029,7 +1029,7 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu)
 	 * Intercept INVPCID if shadow paging is enabled to sync/free shadow
 	 * roots, or if INVPCID is disabled in the guest to inject #UD.
 	 */
-	if (kvm_cpu_cap_has(X86_FEATURE_INVPCID)) {
+	if (kvm_cpu_cap_has(vcpu->kvm, X86_FEATURE_INVPCID)) {
 		if (!npt_enabled ||
 		    !guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_INVPCID))
 			svm_set_intercept(svm, INTERCEPT_INVPCID);
@@ -1037,7 +1037,7 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu)
 			svm_clr_intercept(svm, INTERCEPT_INVPCID);
 	}
 
-	if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP)) {
+	if (kvm_cpu_cap_has(vcpu->kvm, X86_FEATURE_RDTSCP)) {
 		if (guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP))
 			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
 		else
@@ -5510,7 +5510,7 @@ static __init void svm_set_cpu_caps(void)
 			kvm_cpu_cap_check_and_set(X86_FEATURE_PERFCTR_CORE, F_CPUID_DEFAULT);
 
 		if (kvm_pmu_cap.version != 2 ||
-		    !kvm_cpu_cap_has(X86_FEATURE_PERFCTR_CORE))
+		    !kvm_cpu_cap_has(NULL, X86_FEATURE_PERFCTR_CORE))
 			kvm_cpu_cap_clear(X86_FEATURE_PERFMON_V2, F_CPUID_DEFAULT);
 	}
 
diff --git a/arch/x86/kvm/vmx/hyperv.c b/arch/x86/kvm/vmx/hyperv.c
index fa41d036acd4..302f7953b939 100644
--- a/arch/x86/kvm/vmx/hyperv.c
+++ b/arch/x86/kvm/vmx/hyperv.c
@@ -38,7 +38,7 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
 	 * Note, do not check the Hyper-V is fully enabled in guest CPUID, this
 	 * helper is used to _get_ the vCPU's supported CPUID.
 	 */
-	if (kvm_cpu_cap_get(X86_FEATURE_VMX) &&
+	if (kvm_cpu_cap_get(NULL, X86_FEATURE_VMX) &&
 	    (!vcpu || to_vmx(vcpu)->nested.enlightened_vmcs_enabled))
 		return (KVM_EVMCS_VERSION << 8) | 1;
 
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 3fe88f29be7a..d7841038edfc 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -7132,8 +7132,8 @@ static void nested_vmx_setup_exit_ctls(struct vmcs_config *vmcs_conf,
 		VM_EXIT_SAVE_VMX_PREEMPTION_TIMER | VM_EXIT_ACK_INTR_ON_EXIT |
 		VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
 
-	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
-	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+	if (!kvm_cpu_cap_has(NULL, X86_FEATURE_SHSTK) &&
+	    !kvm_cpu_cap_has(NULL, X86_FEATURE_IBT))
 		msrs->exit_ctls_high &= ~VM_EXIT_LOAD_CET_STATE;
 
 	/* We support free control of debug control saving. */
@@ -7157,8 +7157,8 @@ static void nested_vmx_setup_entry_ctls(struct vmcs_config *vmcs_conf,
 		(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER |
 		 VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL);
 
-	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
-	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+	if (!kvm_cpu_cap_has(NULL, X86_FEATURE_SHSTK) &&
+	    !kvm_cpu_cap_has(NULL, X86_FEATURE_IBT))
 		msrs->entry_ctls_high &= ~VM_ENTRY_LOAD_CET_STATE;
 
 	/* We support free control of debug control loading. */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index fae6b33949f5..d6d32f3d162b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4268,7 +4268,7 @@ static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 	vmx_set_intercept_for_msr(vcpu, MSR_IA32_SPEC_CTRL, MSR_TYPE_RW,
 				  !to_vmx(vcpu)->spec_ctrl);
 
-	if (kvm_cpu_cap_has(X86_FEATURE_XFD))
+	if (kvm_cpu_cap_has(vcpu->kvm, X86_FEATURE_XFD))
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
 					  !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD));
 
@@ -4280,7 +4280,7 @@ static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
 					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
 
-	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+	if (kvm_cpu_cap_has(vcpu->kvm, X86_FEATURE_SHSTK)) {
 		intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
 
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL0_SSP, MSR_TYPE_RW, intercept);
@@ -4289,7 +4289,8 @@ static void vmx_recalc_msr_intercepts(struct kvm_vcpu *vcpu)
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PL3_SSP, MSR_TYPE_RW, intercept);
 	}
 
-	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK) || kvm_cpu_cap_has(X86_FEATURE_IBT)) {
+	if (kvm_cpu_cap_has(vcpu->kvm, X86_FEATURE_SHSTK) ||
+	    kvm_cpu_cap_has(vcpu->kvm, X86_FEATURE_IBT)) {
 		intercept = !guest_cpu_cap_has(vcpu, X86_FEATURE_IBT) &&
 			    !guest_cpu_cap_has(vcpu, X86_FEATURE_SHSTK);
 
@@ -5031,12 +5032,12 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0);  /* 22.2.1 */
 
-	if (kvm_cpu_cap_has(X86_FEATURE_SHSTK)) {
+	if (kvm_cpu_cap_has(vcpu->kvm, X86_FEATURE_SHSTK)) {
 		vmcs_writel(GUEST_SSP, 0);
 		vmcs_writel(GUEST_INTR_SSP_TABLE, 0);
 	}
-	if (kvm_cpu_cap_has(X86_FEATURE_IBT) ||
-	    kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+	if (kvm_cpu_cap_has(vcpu->kvm, X86_FEATURE_IBT) ||
+	    kvm_cpu_cap_has(vcpu->kvm, X86_FEATURE_SHSTK))
 		vmcs_writel(GUEST_S_CET, 0);
 
 	kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 525fcb09a4c0..4f713afd909a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7672,33 +7672,33 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 			return;
 		break;
 	case MSR_TSC_AUX:
-		if (!kvm_cpu_cap_has(X86_FEATURE_RDTSCP) &&
-		    !kvm_cpu_cap_has(X86_FEATURE_RDPID))
+		if (!kvm_cpu_cap_has(NULL, X86_FEATURE_RDTSCP) &&
+		    !kvm_cpu_cap_has(NULL, X86_FEATURE_RDPID))
 			return;
 		break;
 	case MSR_IA32_UMWAIT_CONTROL:
-		if (!kvm_cpu_cap_has(X86_FEATURE_WAITPKG))
+		if (!kvm_cpu_cap_has(NULL, X86_FEATURE_WAITPKG))
 			return;
 		break;
 	case MSR_IA32_RTIT_CTL:
 	case MSR_IA32_RTIT_STATUS:
-		if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT))
+		if (!kvm_cpu_cap_has(NULL, X86_FEATURE_INTEL_PT))
 			return;
 		break;
 	case MSR_IA32_RTIT_CR3_MATCH:
-		if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT) ||
+		if (!kvm_cpu_cap_has(NULL, X86_FEATURE_INTEL_PT) ||
 		    !intel_pt_validate_hw_cap(PT_CAP_cr3_filtering))
 			return;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_BASE:
 	case MSR_IA32_RTIT_OUTPUT_MASK:
-		if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT) ||
+		if (!kvm_cpu_cap_has(NULL, X86_FEATURE_INTEL_PT) ||
 		    (!intel_pt_validate_hw_cap(PT_CAP_topa_output) &&
 		     !intel_pt_validate_hw_cap(PT_CAP_single_range_output)))
 			return;
 		break;
 	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
-		if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT) ||
+		if (!kvm_cpu_cap_has(NULL, X86_FEATURE_INTEL_PT) ||
 		    (msr_index - MSR_IA32_RTIT_ADDR0_A >=
 		     intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2))
 			return;
@@ -7725,12 +7725,12 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
 	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
 	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_SET:
-		if (!kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2))
+		if (!kvm_cpu_cap_has(NULL, X86_FEATURE_PERFMON_V2))
 			return;
 		break;
 	case MSR_IA32_XFD:
 	case MSR_IA32_XFD_ERR:
-		if (!kvm_cpu_cap_has(X86_FEATURE_XFD))
+		if (!kvm_cpu_cap_has(NULL, X86_FEATURE_XFD))
 			return;
 		break;
 	case MSR_IA32_TSX_CTRL:
@@ -7743,16 +7743,16 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 		break;
 	case MSR_IA32_U_CET:
 	case MSR_IA32_S_CET:
-		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
-		    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+		if (!kvm_cpu_cap_has(NULL, X86_FEATURE_SHSTK) &&
+		    !kvm_cpu_cap_has(NULL, X86_FEATURE_IBT))
 			return;
 		break;
 	case MSR_IA32_INT_SSP_TAB:
-		if (!kvm_cpu_cap_has(X86_FEATURE_LM))
+		if (!kvm_cpu_cap_has(NULL, X86_FEATURE_LM))
 			return;
 		fallthrough;
 	case MSR_IA32_PL0_SSP ... MSR_IA32_PL3_SSP:
-		if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK))
+		if (!kvm_cpu_cap_has(NULL, X86_FEATURE_SHSTK))
 			return;
 		break;
 	default:
@@ -10026,11 +10026,11 @@ static struct notifier_block pvclock_gtod_notifier = {
 
 void kvm_setup_xss_caps(void)
 {
-	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
+	if (!kvm_cpu_cap_has(NULL, X86_FEATURE_XSAVES))
 		kvm_caps.supported_xss = 0;
 
-	if (!kvm_cpu_cap_has(X86_FEATURE_SHSTK) &&
-	    !kvm_cpu_cap_has(X86_FEATURE_IBT))
+	if (!kvm_cpu_cap_has(NULL, X86_FEATURE_SHSTK) &&
+	    !kvm_cpu_cap_has(NULL, X86_FEATURE_IBT))
 		kvm_caps.supported_xss &= ~XFEATURE_MASK_CET_ALL;
 
 	if ((kvm_caps.supported_xss & XFEATURE_MASK_CET_ALL) != XFEATURE_MASK_CET_ALL) {
@@ -10043,13 +10043,13 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_setup_xss_caps);
 
 static void kvm_setup_efer_caps(void)
 {
-	if (kvm_cpu_cap_has(X86_FEATURE_NX))
+	if (kvm_cpu_cap_has(NULL, X86_FEATURE_NX))
 		kvm_enable_efer_bits(EFER_NX);
 
-	if (kvm_cpu_cap_has(X86_FEATURE_FXSR_OPT))
+	if (kvm_cpu_cap_has(NULL, X86_FEATURE_FXSR_OPT))
 		kvm_enable_efer_bits(EFER_FFXSR);
 
-	if (kvm_cpu_cap_has(X86_FEATURE_AUTOIBRS))
+	if (kvm_cpu_cap_has(NULL, X86_FEATURE_AUTOIBRS))
 		kvm_enable_efer_bits(EFER_AUTOIBRS);
 }
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 38a905fa86de..45534d863bbe 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -320,7 +320,7 @@ static inline u8 vcpu_virt_addr_bits(struct kvm_vcpu *vcpu)
 
 static inline u8 max_host_virt_addr_bits(void)
 {
-	return kvm_cpu_cap_has(X86_FEATURE_LA57) ? 57 : 48;
+	return kvm_cpu_cap_has(NULL, X86_FEATURE_LA57) ? 57 : 48;
 }
 
 /*
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 09/27] KVM: x86: Use overlays of KVM CPU capabilities
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (7 preceding siblings ...)
  2026-04-17  7:35 ` [RFC PATCH 08/27] KVM: x86: Thread @kvm to KVM CPU capability helpers Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 10/27] KVM: x86: Use vendor-specific overlay flags instead of F_CPUID_DEFAULT Binbin Wu
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Select the appropriate CPUID overlay based on the VM type or/and hardware
vendor rather than using CPUID_OL_DEFAULT.

When the KVM CPU capabilities are queried and modified, a CPUID
overlay is used according to the VM type and/or the hardware platform.

For ALIASED_1_EDX_F() use CPUID_OL_SVM instead of CPUID_OL_DEFAULT, since
the aliased 0x8000_0001.EDX features are AMD-defined duplicates of 0x1.EDX
and are only meaningful for SVM guests.

Drop the now-unnecessary CPUID_OL_DEFAULT alias.

Return 0 for emulated CPUIDs when the overlay is TDX, as KVM cannot
emulate the related instruction for TDX guests.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 19 ++++++++++++-------
 arch/x86/kvm/cpuid.h |  9 ++++-----
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 20ea483ddc7a..2c4e64aa14c4 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -374,6 +374,7 @@ static int cpuid_func_emulated(struct kvm *kvm, struct kvm_cpuid_entry2 *entry,
 
 void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
+	u8 cpuid_overlay = get_cpuid_overlay(vcpu->kvm);
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	struct kvm_cpuid_entry2 *best;
 	struct kvm_cpuid_entry2 *entry;
@@ -404,7 +405,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		 * in guest CPUID.  Note, this includes features that are
 		 * supported by KVM but aren't advertised to userspace!
 		 */
-		vcpu->arch.cpu_caps[i] = kvm_cpu_caps[CPUID_OL_DEFAULT][i];
+		vcpu->arch.cpu_caps[i] = kvm_cpu_caps[cpuid_overlay][i];
 		if (!cpuid.index) {
 			cpuid_func_emulated(vcpu->kvm, &emulated, cpuid.function, true);
 			vcpu->arch.cpu_caps[i] |= cpuid_get_reg_unsafe(&emulated, cpuid.reg);
@@ -806,12 +807,12 @@ do {									\
  * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
  * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
  */
-#define ALIASED_1_EDX_F(name)								\
-({											\
-	BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);		\
-	BUILD_BUG_ON(kvm_cpu_cap_init_in_progress != CPUID_8000_0001_EDX);		\
-	kvm_cpu_cap_features |= feature_bit(name);					\
-	kvm_cpu_caps[CPUID_OL_DEFAULT][CPUID_8000_0001_EDX] |= feature_bit(name);	\
+#define ALIASED_1_EDX_F(name)							\
+({										\
+	BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);	\
+	BUILD_BUG_ON(kvm_cpu_cap_init_in_progress != CPUID_8000_0001_EDX);	\
+	kvm_cpu_cap_features |= feature_bit(name);				\
+	kvm_cpu_caps[CPUID_OL_SVM][CPUID_8000_0001_EDX] |= feature_bit(name);	\
 })
 
 /*
@@ -1414,6 +1415,10 @@ static int cpuid_func_emulated(struct kvm *kvm, struct kvm_cpuid_entry2 *entry,
 	entry->index = 0;
 	entry->flags = 0;
 
+	/* KVM can't do the following emulations for TDX guests. */
+	if (get_cpuid_overlay(kvm) == CPUID_OL_TDX)
+		return 0;
+
 	switch (func) {
 	case 0:
 		entry->eax = 7;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index eae46f37d30f..c3f2417c7980 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -14,9 +14,6 @@ enum kvm_cpuid_overlay {
 	NR_CPUID_OL
 };
 
-/* Temporarily use VMX overlay as the default one */
-#define CPUID_OL_DEFAULT	CPUID_OL_VMX
-
 #define F_CPUID_VMX		BIT(CPUID_OL_VMX)
 #define F_CPUID_SVM		BIT(CPUID_OL_SVM)
 #define F_CPUID_TDX		BIT(CPUID_OL_TDX)
@@ -122,9 +119,10 @@ static __always_inline void cpuid_entry_override(struct kvm *kvm,
 						 unsigned int leaf)
 {
 	u32 *reg = cpuid_entry_get_reg(entry, leaf * 32);
+	u8 cpuid_overlay = get_cpuid_overlay(kvm);
 
 	BUILD_BUG_ON(leaf >= ARRAY_SIZE(kvm_cpu_caps[0]));
-	*reg = kvm_cpu_caps[CPUID_OL_DEFAULT][leaf];
+	*reg = kvm_cpu_caps[cpuid_overlay][leaf];
 }
 
 static __always_inline bool guest_cpuid_has(struct kvm_vcpu *vcpu,
@@ -245,8 +243,9 @@ static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature, u32 overla
 static __always_inline u32 kvm_cpu_cap_get(struct kvm *kvm, unsigned int x86_feature)
 {
 	unsigned int x86_leaf = __feature_leaf(x86_feature);
+	u8 cpuid_overlay = get_cpuid_overlay(kvm);
 
-	return kvm_cpu_caps[CPUID_OL_DEFAULT][x86_leaf] & __feature_bit(x86_feature);
+	return kvm_cpu_caps[cpuid_overlay][x86_leaf] & __feature_bit(x86_feature);
 }
 
 static __always_inline bool kvm_cpu_cap_has(struct kvm *kvm, unsigned int x86_feature)
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 10/27] KVM: x86: Use vendor-specific overlay flags instead of F_CPUID_DEFAULT
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (8 preceding siblings ...)
  2026-04-17  7:35 ` [RFC PATCH 09/27] KVM: x86: Use overlays of KVM CPU capabilities Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 11/27] KVM: SVM: Drop unnecessary clears of unsupported common x86 features Binbin Wu
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Use F_CPUID_VMX or F_CPUID_SVM instead of F_CPUID_DEFAULT when a feature
is vendor specific and the underlying hardware capability is not
checked. Also, use respective flags in vendor modules.

A feature initialized via F() and its variants in kvm_cpu_cap_init()
will check the host CPU capability or the raw CPUID, the feature can
only be set to the related overlay when the feature is supported by the
underlying hardware. Using F_CPUID_VMX or F_CPUID_SVM when a feature is
vendor specific makes the code more readable, however, it could
introduce regressions if a common feature is set only for one vendor.
For simplicity, just keep using F_CPUID_DEFAULT when the underlying
hardware capability will be checked.

Features initialized via kvm_cpu_cap_set(), EMULATED_F() doesn't check
the host CPU capability or the raw CPUID, use F_CPUID_VMX or
F_CPUID_SVM respectively if the feature is vendor specific.

In vendor modules, vendor flags are used respectively.

There are a few exceptions, i.e., IBT, BUS_LOCK_DETECT, and
MSR_IMM. They are common features for both vendors, but not supported
by SVM yet. Use F_CPUID_VMX instead of F_CPUID_DEFAULT for them.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c   | 30 +++++++++++++++---------------
 arch/x86/kvm/svm/sev.c |  6 +++---
 arch/x86/kvm/svm/svm.c | 38 +++++++++++++++++++-------------------
 arch/x86/kvm/vmx/vmx.c | 36 ++++++++++++++++++------------------
 4 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 2c4e64aa14c4..71959f4918e7 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -983,7 +983,7 @@ void kvm_initialize_cpu_caps(void)
 		/* MPX_MAWAU */
 		F(RDPID, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* KEY_LOCKER */
-		F(BUS_LOCK_DETECT, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(BUS_LOCK_DETECT, F_CPUID_VMX | F_CPUID_TDX),
 		F(CLDEMOTE, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* Reserved */
 		F(MOVDIRI, F_CPUID_DEFAULT | F_CPUID_TDX),
@@ -1022,7 +1022,7 @@ void kvm_initialize_cpu_caps(void)
 		/* HYBRID_CPU */
 		F(TSXLDTRK, F_CPUID_DEFAULT | F_CPUID_TDX),
 		/* Reserved, PCONFIG, ARCH_LBR */
-		F(IBT, F_CPUID_DEFAULT | F_CPUID_TDX),
+		F(IBT, F_CPUID_VMX | F_CPUID_TDX),
 		/* Reserved */
 		F(AMX_BF16, F_CPUID_DEFAULT | F_CPUID_TDX),
 		F(AVX512_FP16, F_CPUID_DEFAULT | F_CPUID_TDX),
@@ -1049,11 +1049,11 @@ void kvm_initialize_cpu_caps(void)
 	if (boot_cpu_has(X86_FEATURE_AMD_IBPB_RET) &&
 	    boot_cpu_has(X86_FEATURE_AMD_IBPB) &&
 	    boot_cpu_has(X86_FEATURE_AMD_IBRS))
-		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL, F_CPUID_SVM);
 	if (boot_cpu_has(X86_FEATURE_STIBP))
-		kvm_cpu_cap_set(X86_FEATURE_INTEL_STIBP, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_INTEL_STIBP, F_CPUID_VMX);
 	if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
-		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD, F_CPUID_SVM);
 
 	kvm_cpu_cap_init(CPUID_7_1_EAX,
 		F(SHA512, F_CPUID_DEFAULT | F_CPUID_TDX),
@@ -1075,7 +1075,7 @@ void kvm_initialize_cpu_caps(void)
 
 	kvm_cpu_cap_init(CPUID_7_1_ECX,
 		/* MSR_IMM is reserved in TDX spec */
-		SCATTERED_F(MSR_IMM, F_CPUID_DEFAULT),
+		SCATTERED_F(MSR_IMM, F_CPUID_VMX),
 	);
 
 	kvm_cpu_cap_init(CPUID_7_1_EDX,
@@ -1217,26 +1217,26 @@ void kvm_initialize_cpu_caps(void)
 	 * record that in cpufeatures so use them.
 	 */
 	if (boot_cpu_has(X86_FEATURE_IBPB)) {
-		kvm_cpu_cap_set(X86_FEATURE_AMD_IBPB, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_AMD_IBPB, F_CPUID_SVM);
 		if (boot_cpu_has(X86_FEATURE_SPEC_CTRL) &&
 		    !boot_cpu_has_bug(X86_BUG_EIBRS_PBRSB))
-			kvm_cpu_cap_set(X86_FEATURE_AMD_IBPB_RET, F_CPUID_DEFAULT);
+			kvm_cpu_cap_set(X86_FEATURE_AMD_IBPB_RET, F_CPUID_SVM);
 	}
 	if (boot_cpu_has(X86_FEATURE_IBRS))
-		kvm_cpu_cap_set(X86_FEATURE_AMD_IBRS, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_AMD_IBRS, F_CPUID_SVM);
 	if (boot_cpu_has(X86_FEATURE_STIBP))
-		kvm_cpu_cap_set(X86_FEATURE_AMD_STIBP, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_AMD_STIBP, F_CPUID_SVM);
 	if (boot_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD))
-		kvm_cpu_cap_set(X86_FEATURE_AMD_SSBD, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_AMD_SSBD, F_CPUID_SVM);
 	if (!boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS))
-		kvm_cpu_cap_set(X86_FEATURE_AMD_SSB_NO, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_AMD_SSB_NO, F_CPUID_SVM);
 	/*
 	 * The preference is to use SPEC CTRL MSR instead of the
 	 * VIRT_SPEC MSR.
 	 */
 	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) &&
 	    !boot_cpu_has(X86_FEATURE_AMD_SSBD))
-		kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD, F_CPUID_SVM);
 
 	/* All SVM features required additional vendor module enabling. */
 	kvm_cpu_cap_init(CPUID_8000_000A_EDX,
@@ -1282,7 +1282,7 @@ void kvm_initialize_cpu_caps(void)
 		F(NULL_SEL_CLR_BASE, F_CPUID_DEFAULT),
 		/* UpperAddressIgnore */
 		F(AUTOIBRS, F_CPUID_DEFAULT),
-		EMULATED_F(NO_SMM_CTL_MSR, F_CPUID_DEFAULT),
+		EMULATED_F(NO_SMM_CTL_MSR, F_CPUID_SVM),
 		/* PrefetchCtlMsr */
 		/* GpOnUserCpuid */
 		/* EPSF */
@@ -1305,7 +1305,7 @@ void kvm_initialize_cpu_caps(void)
 	);
 
 	if (!static_cpu_has_bug(X86_BUG_NULL_SEG))
-		kvm_cpu_cap_set(X86_FEATURE_NULL_SEL_CLR_BASE, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_NULL_SEL_CLR_BASE, F_CPUID_SVM);
 
 	kvm_cpu_cap_init(CPUID_C000_0001_EDX,
 		F(XSTORE, F_CPUID_DEFAULT),
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6ec9c806e1fb..4b10d63a095a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3014,15 +3014,15 @@ void sev_vm_destroy(struct kvm *kvm)
 void __init sev_set_cpu_caps(void)
 {
 	if (sev_enabled) {
-		kvm_cpu_cap_set(X86_FEATURE_SEV, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_SEV, F_CPUID_SVM);
 		kvm_caps.supported_vm_types |= BIT(KVM_X86_SEV_VM);
 	}
 	if (sev_es_enabled) {
-		kvm_cpu_cap_set(X86_FEATURE_SEV_ES, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_SEV_ES, F_CPUID_SVM);
 		kvm_caps.supported_vm_types |= BIT(KVM_X86_SEV_ES_VM);
 	}
 	if (sev_snp_enabled) {
-		kvm_cpu_cap_set(X86_FEATURE_SEV_SNP, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_SEV_SNP, F_CPUID_SVM);
 		kvm_caps.supported_vm_types |= BIT(KVM_X86_SNP_VM);
 	}
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2b4a17536580..a21c500e1a91 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5445,48 +5445,48 @@ static __init void svm_set_cpu_caps(void)
 
 	kvm_caps.supported_perf_cap = 0;
 
-	kvm_cpu_cap_clear(X86_FEATURE_IBT, F_CPUID_DEFAULT);
+	kvm_cpu_cap_clear(X86_FEATURE_IBT, F_CPUID_SVM);
 
 	/* CPUID 0x80000001 and 0x8000000A (SVM features) */
 	if (nested) {
-		kvm_cpu_cap_set(X86_FEATURE_SVM, F_CPUID_DEFAULT);
-		kvm_cpu_cap_set(X86_FEATURE_VMCBCLEAN, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_SVM, F_CPUID_SVM);
+		kvm_cpu_cap_set(X86_FEATURE_VMCBCLEAN, F_CPUID_SVM);
 
 		/*
 		 * KVM currently flushes TLBs on *every* nested SVM transition,
 		 * and so for all intents and purposes KVM supports flushing by
 		 * ASID, i.e. KVM is guaranteed to honor every L1 ASID flush.
 		 */
-		kvm_cpu_cap_set(X86_FEATURE_FLUSHBYASID, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_FLUSHBYASID, F_CPUID_SVM);
 
 		if (nrips)
-			kvm_cpu_cap_set(X86_FEATURE_NRIPS, F_CPUID_DEFAULT);
+			kvm_cpu_cap_set(X86_FEATURE_NRIPS, F_CPUID_SVM);
 
 		if (npt_enabled)
-			kvm_cpu_cap_set(X86_FEATURE_NPT, F_CPUID_DEFAULT);
+			kvm_cpu_cap_set(X86_FEATURE_NPT, F_CPUID_SVM);
 
 		if (tsc_scaling)
-			kvm_cpu_cap_set(X86_FEATURE_TSCRATEMSR, F_CPUID_DEFAULT);
+			kvm_cpu_cap_set(X86_FEATURE_TSCRATEMSR, F_CPUID_SVM);
 
 		if (vls)
-			kvm_cpu_cap_set(X86_FEATURE_V_VMSAVE_VMLOAD, F_CPUID_DEFAULT);
+			kvm_cpu_cap_set(X86_FEATURE_V_VMSAVE_VMLOAD, F_CPUID_SVM);
 		if (lbrv)
-			kvm_cpu_cap_set(X86_FEATURE_LBRV, F_CPUID_DEFAULT);
+			kvm_cpu_cap_set(X86_FEATURE_LBRV, F_CPUID_SVM);
 
 		if (boot_cpu_has(X86_FEATURE_PAUSEFILTER))
-			kvm_cpu_cap_set(X86_FEATURE_PAUSEFILTER, F_CPUID_DEFAULT);
+			kvm_cpu_cap_set(X86_FEATURE_PAUSEFILTER, F_CPUID_SVM);
 
 		if (boot_cpu_has(X86_FEATURE_PFTHRESHOLD))
-			kvm_cpu_cap_set(X86_FEATURE_PFTHRESHOLD, F_CPUID_DEFAULT);
+			kvm_cpu_cap_set(X86_FEATURE_PFTHRESHOLD, F_CPUID_SVM);
 
 		if (vgif)
-			kvm_cpu_cap_set(X86_FEATURE_VGIF, F_CPUID_DEFAULT);
+			kvm_cpu_cap_set(X86_FEATURE_VGIF, F_CPUID_SVM);
 
 		if (vnmi)
-			kvm_cpu_cap_set(X86_FEATURE_VNMI, F_CPUID_DEFAULT);
+			kvm_cpu_cap_set(X86_FEATURE_VNMI, F_CPUID_SVM);
 
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
-		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK, F_CPUID_SVM);
 	}
 
 	if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD))
@@ -5495,7 +5495,7 @@ static __init void svm_set_cpu_caps(void)
 	/* CPUID 0x80000008 */
 	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
 	    boot_cpu_has(X86_FEATURE_AMD_SSBD))
-		kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD, F_CPUID_SVM);
 
 	if (enable_pmu) {
 		/*
@@ -5507,11 +5507,11 @@ static __init void svm_set_cpu_caps(void)
 			kvm_pmu_cap.num_counters_gp = min(AMD64_NUM_COUNTERS,
 							  kvm_pmu_cap.num_counters_gp);
 		else
-			kvm_cpu_cap_check_and_set(X86_FEATURE_PERFCTR_CORE, F_CPUID_DEFAULT);
+			kvm_cpu_cap_check_and_set(X86_FEATURE_PERFCTR_CORE, F_CPUID_SVM);
 
 		if (kvm_pmu_cap.version != 2 ||
 		    !kvm_cpu_cap_has(NULL, X86_FEATURE_PERFCTR_CORE))
-			kvm_cpu_cap_clear(X86_FEATURE_PERFMON_V2, F_CPUID_DEFAULT);
+			kvm_cpu_cap_clear(X86_FEATURE_PERFMON_V2, F_CPUID_SVM);
 	}
 
 	/* CPUID 0x8000001F (SME/SEV features) */
@@ -5521,8 +5521,8 @@ static __init void svm_set_cpu_caps(void)
 	 * Clear capabilities that are automatically configured by common code,
 	 * but that require explicit SVM support (that isn't yet implemented).
 	 */
-	kvm_cpu_cap_clear(X86_FEATURE_BUS_LOCK_DETECT, F_CPUID_DEFAULT);
-	kvm_cpu_cap_clear(X86_FEATURE_MSR_IMM, F_CPUID_DEFAULT);
+	kvm_cpu_cap_clear(X86_FEATURE_BUS_LOCK_DETECT, F_CPUID_SVM);
+	kvm_cpu_cap_clear(X86_FEATURE_MSR_IMM, F_CPUID_SVM);
 
 	kvm_setup_xss_caps();
 	kvm_finalize_cpu_caps();
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d6d32f3d162b..f772558758f7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8086,55 +8086,55 @@ static __init void vmx_set_cpu_caps(void)
 
 	/* CPUID 0x1 */
 	if (nested)
-		kvm_cpu_cap_set(X86_FEATURE_VMX, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_VMX, F_CPUID_VMX);
 
 	/* CPUID 0x7 */
 	/* MPX is fixed-0 for TDX */
 	if (kvm_mpx_supported())
-		kvm_cpu_cap_check_and_set(X86_FEATURE_MPX, F_CPUID_DEFAULT);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_MPX, F_CPUID_VMX);
 	/* INVPCID is fixed-1 for TDX */
 	if (!cpu_has_vmx_invpcid())
-		kvm_cpu_cap_clear(X86_FEATURE_INVPCID, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_INVPCID, F_CPUID_VMX);
 	/* KVM doesn't support PT for TDX yet */
 	if (vmx_pt_mode_is_host_guest())
-		kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT, F_CPUID_DEFAULT);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT, F_CPUID_VMX);
 
 	/* DS and DTES64 are fixed-1 for TDX */
-	enable_mask = vmx_pebs_supported() ? F_CPUID_TDX | F_CPUID_DEFAULT : F_CPUID_TDX;
+	enable_mask = vmx_pebs_supported() ? F_CPUID_TDX | F_CPUID_VMX : F_CPUID_TDX;
 	kvm_cpu_cap_check_and_set(X86_FEATURE_DS, enable_mask);
 	kvm_cpu_cap_check_and_set(X86_FEATURE_DTES64, enable_mask);
 
 	/* PDCM is fixed-1 for TDX */
 	if (!enable_pmu)
-		kvm_cpu_cap_clear(X86_FEATURE_PDCM, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_PDCM, F_CPUID_VMX);
 	kvm_caps.supported_perf_cap = vmx_get_perf_capabilities();
 
 	/* SGX related features are fixed-0 for TDX */
 	if (!enable_sgx) {
-		kvm_cpu_cap_clear(X86_FEATURE_SGX, F_CPUID_DEFAULT);
-		kvm_cpu_cap_clear(X86_FEATURE_SGX_LC, F_CPUID_DEFAULT);
-		kvm_cpu_cap_clear(X86_FEATURE_SGX1, F_CPUID_DEFAULT);
-		kvm_cpu_cap_clear(X86_FEATURE_SGX2, F_CPUID_DEFAULT);
-		kvm_cpu_cap_clear(X86_FEATURE_SGX_EDECCSSA, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX, F_CPUID_VMX);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX_LC, F_CPUID_VMX);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX1, F_CPUID_VMX);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX2, F_CPUID_VMX);
+		kvm_cpu_cap_clear(X86_FEATURE_SGX_EDECCSSA, F_CPUID_VMX);
 	}
 
 	if (vmx_umip_emulated())
-		kvm_cpu_cap_set(X86_FEATURE_UMIP, F_CPUID_DEFAULT);
+		kvm_cpu_cap_set(X86_FEATURE_UMIP, F_CPUID_VMX);
 
 	/* CPUID 0xD.1 */
 	/* XSAVES is fixed-1 for TDX */
 	if (!cpu_has_vmx_xsaves())
-		kvm_cpu_cap_clear(X86_FEATURE_XSAVES, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_XSAVES, F_CPUID_VMX);
 
 	/* CPUID 0x80000001 and 0x7 (RDPID) */
 	if (!cpu_has_vmx_rdtscp()) {
-		kvm_cpu_cap_clear(X86_FEATURE_RDTSCP, F_CPUID_DEFAULT);
-		kvm_cpu_cap_clear(X86_FEATURE_RDPID, F_CPUID_DEFAULT);
+		kvm_cpu_cap_clear(X86_FEATURE_RDTSCP, F_CPUID_VMX);
+		kvm_cpu_cap_clear(X86_FEATURE_RDPID, F_CPUID_VMX);
 	}
 
 	/* KVM doesn't support WAITPKG for TDX yet */
 	if (cpu_has_vmx_waitpkg())
-		kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG, F_CPUID_DEFAULT);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_WAITPKG, F_CPUID_VMX);
 
 	/*
 	 * Disable CET if unrestricted_guest is unsupported as KVM doesn't
@@ -8144,8 +8144,8 @@ static __init void vmx_set_cpu_caps(void)
 	 */
 	if (!cpu_has_load_cet_ctrl() || !enable_unrestricted_guest ||
 	    !cpu_has_vmx_basic_no_hw_errcode_cc()) {
-		kvm_cpu_cap_clear(X86_FEATURE_SHSTK, F_CPUID_DEFAULT | F_CPUID_TDX);
-		kvm_cpu_cap_clear(X86_FEATURE_IBT, F_CPUID_DEFAULT | F_CPUID_TDX);
+		kvm_cpu_cap_clear(X86_FEATURE_SHSTK, F_CPUID_VMX | F_CPUID_TDX);
+		kvm_cpu_cap_clear(X86_FEATURE_IBT, F_CPUID_VMX | F_CPUID_TDX);
 	}
 
 	kvm_setup_xss_caps();
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 11/27] KVM: SVM: Drop unnecessary clears of unsupported common x86 features
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (9 preceding siblings ...)
  2026-04-17  7:35 ` [RFC PATCH 10/27] KVM: x86: Use vendor-specific overlay flags instead of F_CPUID_DEFAULT Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 12/27] KVM: x86: Split KVM CPU cap leafs into two parts Binbin Wu
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Remove kvm_cpu_cap_clear() calls for IBT, BUS_LOCK_DETECT, and MSR_IMM
in svm_set_cpu_caps() since they are not set in common code anymore.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/svm/svm.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a21c500e1a91..ab3405640764 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5445,8 +5445,6 @@ static __init void svm_set_cpu_caps(void)
 
 	kvm_caps.supported_perf_cap = 0;
 
-	kvm_cpu_cap_clear(X86_FEATURE_IBT, F_CPUID_SVM);
-
 	/* CPUID 0x80000001 and 0x8000000A (SVM features) */
 	if (nested) {
 		kvm_cpu_cap_set(X86_FEATURE_SVM, F_CPUID_SVM);
@@ -5517,13 +5515,6 @@ static __init void svm_set_cpu_caps(void)
 	/* CPUID 0x8000001F (SME/SEV features) */
 	sev_set_cpu_caps();
 
-	/*
-	 * Clear capabilities that are automatically configured by common code,
-	 * but that require explicit SVM support (that isn't yet implemented).
-	 */
-	kvm_cpu_cap_clear(X86_FEATURE_BUS_LOCK_DETECT, F_CPUID_SVM);
-	kvm_cpu_cap_clear(X86_FEATURE_MSR_IMM, F_CPUID_SVM);
-
 	kvm_setup_xss_caps();
 	kvm_finalize_cpu_caps();
 }
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 12/27] KVM: x86: Split KVM CPU cap leafs into two parts
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (10 preceding siblings ...)
  2026-04-17  7:35 ` [RFC PATCH 11/27] KVM: SVM: Drop unnecessary clears of unsupported common x86 features Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 13/27] KVM: x86: Add a helper to initialize CPUID multi-bit fields Binbin Wu
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Introduce NR_KVM_CPU_CAPS_PARANOID as the total number of KVM CPUID
leafs, distinct from NR_KVM_CPU_CAPS which denotes only the leafs
tracked in the per-vCPU cpu_caps[] array.

The number of per-overlay leafs in the global kvm_cpu_caps[][] array is
extended to NR_KVM_CPU_CAPS_PARANOID so that it can hold both CPUID
leafs queried by KVM during vCPU runtime and additional leafs used
exclusively for CPUID paranoid mode validation.  The per-vCPU
cpu_caps[] array in kvm_vcpu_arch remains sized to NR_KVM_CPU_CAPS,
since KVM only cares these leaves during vCPU running and should not
grow when paranoid-mode-only leaves are added.

Add BUILD_BUG_ON() for guest_cpu_cap_{set, clear, has}() to prevent
accidental out-of-bounds access to the per-vCPU array with leaves that
are only present in the global array.

No functional change, as NR_KVM_CPU_CAPS_PARANOID == NR_KVM_CPU_CAPS
until paranoid-only leaves are introduced.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h | 13 +++++++++----
 arch/x86/kvm/cpuid.c            |  4 ++--
 arch/x86/kvm/cpuid.h            |  5 ++++-
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c470e40a00aa..75895ab569fb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -774,9 +774,12 @@ struct kvm_queued_exception {
 };
 
 /*
- * Hardware-defined CPUID leafs that are either scattered by the kernel or are
- * unknown to the kernel, but need to be directly used by KVM.  Note, these
- * word values conflict with the kernel's "bug" caps, but KVM doesn't use those.
+ * The leafs before NR_KVM_CPU_CAPS are hardware-defined CPUID leafs that are
+ * either scattered by the kernel or are unknown to the kernel, but need to be
+ * directly used by KVM during vCPU running.  Note, these word values conflict
+ * with the kernel's "bug" caps, but KVM doesn't use those.
+ * The leafs from NR_KVM_CPU_CAPS and above are only used for validation of
+ * CPUID inputs from userspace in CPUID paranoid mode.
  */
 enum kvm_only_cpuid_leafs {
 	CPUID_12_EAX	 = NCAPINTS,
@@ -789,9 +792,11 @@ enum kvm_only_cpuid_leafs {
 	CPUID_7_1_ECX,
 	CPUID_1E_1_EAX,
 	CPUID_24_1_ECX,
+	/* End of the leafs tracked by per-vcpu caps. */
 	NR_KVM_CPU_CAPS,
+	NR_KVM_CPU_CAPS_PARANOID = NR_KVM_CPU_CAPS,
 
-	NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
+	NKVMCAPINTS = NR_KVM_CPU_CAPS_PARANOID - NCAPINTS,
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 71959f4918e7..78d8f89d6079 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -33,7 +33,7 @@
  * Unlike "struct cpuinfo_x86.x86_capability", kvm_cpu_caps doesn't need to be
  * aligned to sizeof(unsigned long) because it's not accessed via bitops.
  */
-u32 kvm_cpu_caps[NR_CPUID_OL][NR_KVM_CPU_CAPS] __read_mostly;
+u32 kvm_cpu_caps[NR_CPUID_OL][NR_KVM_CPU_CAPS_PARANOID] __read_mostly;
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_cpu_caps);
 
 bool kvm_is_configuring_cpu_caps __read_mostly;
@@ -382,7 +382,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	int i;
 
 	memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps));
-	BUILD_BUG_ON(ARRAY_SIZE(reverse_cpuid) != NR_KVM_CPU_CAPS);
+	BUILD_BUG_ON(ARRAY_SIZE(reverse_cpuid) != NR_KVM_CPU_CAPS_PARANOID);
 
 	/*
 	 * Reset guest capabilities to userspace's guest CPUID definition, i.e.
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index c3f2417c7980..bdfaedb1cfcc 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -31,7 +31,7 @@ static inline u8 get_cpuid_overlay(struct kvm *kvm)
 	return CPUID_OL_VMX;
 }
 
-extern u32 kvm_cpu_caps[NR_CPUID_OL][NR_KVM_CPU_CAPS] __read_mostly;
+extern u32 kvm_cpu_caps[NR_CPUID_OL][NR_KVM_CPU_CAPS_PARANOID] __read_mostly;
 extern bool kvm_is_configuring_cpu_caps __read_mostly;
 
 void kvm_initialize_cpu_caps(void);
@@ -273,6 +273,7 @@ static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
 {
 	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
+	BUILD_BUG_ON(x86_leaf >= NR_KVM_CPU_CAPS);
 	vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
 }
 
@@ -281,6 +282,7 @@ static __always_inline void guest_cpu_cap_clear(struct kvm_vcpu *vcpu,
 {
 	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
+	BUILD_BUG_ON(x86_leaf >= NR_KVM_CPU_CAPS);
 	vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature);
 }
 
@@ -299,6 +301,7 @@ static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
 {
 	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
+	BUILD_BUG_ON(x86_leaf >= NR_KVM_CPU_CAPS);
 	/*
 	 * Except for MWAIT, querying dynamic feature bits is disallowed, so
 	 * that KVM can defer runtime updates until the next CPUID emulation.
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 13/27] KVM: x86: Add a helper to initialize CPUID multi-bit fields
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (11 preceding siblings ...)
  2026-04-17  7:35 ` [RFC PATCH 12/27] KVM: x86: Split KVM CPU cap leafs into two parts Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 14/27] KVM: x86: Add a helper to init multiple feature bits based on raw CPUID Binbin Wu
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Add kvm_cpu_cap_init_mf() to initialize CPUID leaves that encode
multi-bit value fields for specified overlays.

Unlike kvm_cpu_cap_init(), this helper directly assigns the provided
value without intersecting it with native CPUID or boot_cpu_data. This
is necessary because multi-bit fields encode numeric values, e.g.
address widths, field sizes, etc., where the value userspace wants to
expose to guests may differ from the native hardware value.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index bdfaedb1cfcc..ea8ff5210e4a 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -259,6 +259,17 @@ static __always_inline void kvm_cpu_cap_check_and_set(unsigned int x86_feature,
 		kvm_cpu_cap_set(x86_feature, overlay_mask);
 }
 
+static __always_inline void kvm_cpu_cap_init_mf(u32 leaf, u32 features, u32 overlay_mask)
+{
+	WARN_ON_ONCE(!kvm_is_configuring_cpu_caps);
+	BUILD_BUG_ON(leaf >= NR_KVM_CPU_CAPS_PARANOID);
+
+	for (int i = 0; i < NR_CPUID_OL; i++) {
+		if (overlay_mask & BIT(i))
+			kvm_cpu_caps[i][leaf] = features;
+	}
+}
+
 static __always_inline bool guest_pv_has(struct kvm_vcpu *vcpu,
 					 unsigned int kvm_feature)
 {
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 14/27] KVM: x86: Add a helper to init multiple feature bits based on raw CPUID
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (12 preceding siblings ...)
  2026-04-17  7:35 ` [RFC PATCH 13/27] KVM: x86: Add a helper to initialize CPUID multi-bit fields Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 15/27] KVM: x86: Add infrastructure to track CPUID entries ignored in paranoid mode Binbin Wu
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Add kvm_cpu_cap_check_and_init_mf() to initialize KVM-only CPUID leafs
whose allowed feature bits needs to be intersected with the raw host
CPUID value for CPUID paranoid verification.

Use it instead of kvm_cpu_cap_init() to avoid adding new X86 feature bit
definitions in the common x86 header.

Move raw_cpuid_get() to cpuid.h since kvm_cpu_cap_check_and_init_mf()
will be called in vendor module.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 23 -----------------------
 arch/x86/kvm/cpuid.h | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 78d8f89d6079..3bd9608770a9 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -672,29 +672,6 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
-static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid)
-{
-	struct kvm_cpuid_entry2 entry;
-	u32 base;
-
-	/*
-	 * KVM only supports features defined by Intel (0x0), AMD (0x80000000),
-	 * and Centaur (0xc0000000).  WARN if a feature for new vendor base is
-	 * defined, as this and other code would need to be updated.
-	 */
-	base = cpuid.function & 0xffff0000;
-	if (WARN_ON_ONCE(base && base != 0x80000000 && base != 0xc0000000))
-		return 0;
-
-	if (cpuid_eax(base) < cpuid.function)
-		return 0;
-
-	cpuid_count(cpuid.function, cpuid.index,
-		    &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);
-
-	return *__cpuid_entry_get_reg(&entry, cpuid.reg);
-}
-
 /*
  * For kernel-defined leafs, mask KVM's supported feature set with the kernel's
  * capabilities as well as raw CPUID.  For KVM-defined leafs, consult only raw
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index ea8ff5210e4a..0b90344a8b98 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -259,6 +259,29 @@ static __always_inline void kvm_cpu_cap_check_and_set(unsigned int x86_feature,
 		kvm_cpu_cap_set(x86_feature, overlay_mask);
 }
 
+static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid)
+{
+	struct kvm_cpuid_entry2 entry;
+	u32 base;
+
+	/*
+	 * KVM only supports features defined by Intel (0x0), AMD (0x80000000),
+	 * and Centaur (0xc0000000).  WARN if a feature for new vendor base is
+	 * defined, as this and other code would need to be updated.
+	 */
+	base = cpuid.function & 0xffff0000;
+	if (WARN_ON_ONCE(base && base != 0x80000000 && base != 0xc0000000))
+		return 0;
+
+	if (cpuid_eax(base) < cpuid.function)
+		return 0;
+
+	cpuid_count(cpuid.function, cpuid.index,
+		    &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);
+
+	return *__cpuid_entry_get_reg(&entry, cpuid.reg);
+}
+
 static __always_inline void kvm_cpu_cap_init_mf(u32 leaf, u32 features, u32 overlay_mask)
 {
 	WARN_ON_ONCE(!kvm_is_configuring_cpu_caps);
@@ -270,6 +293,16 @@ static __always_inline void kvm_cpu_cap_init_mf(u32 leaf, u32 features, u32 over
 	}
 }
 
+static __always_inline void kvm_cpu_cap_check_and_init_mf(u32 leaf, u32 features, u32 overlay_mask)
+{
+	reverse_cpuid_check(leaf);
+	/* This function is used for kvm only cpuid leafs. */
+	BUILD_BUG_ON(leaf < NCAPINTS);
+	features &= raw_cpuid_get(reverse_cpuid[leaf]);
+
+	kvm_cpu_cap_init_mf(leaf, features, overlay_mask);
+}
+
 static __always_inline bool guest_pv_has(struct kvm_vcpu *vcpu,
 					 unsigned int kvm_feature)
 {
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 15/27] KVM: x86: Add infrastructure to track CPUID entries ignored in paranoid mode
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (13 preceding siblings ...)
  2026-04-17  7:35 ` [RFC PATCH 14/27] KVM: x86: Add a helper to init multiple feature bits based on raw CPUID Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:35 ` [RFC PATCH 16/27] KVM: x86: Init allowed masks for basic CPUID range " Binbin Wu
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Add a structure and helpers to register and query CPUID leafs/registers
that should be excluded from validation in KVM's CPUID paranoid mode.

CPUID paranoid mode will cross-check CPUID values exposed to guests
against KVM's expected values to detect inconsistencies. Some CPUID
leafs/registers could be expected from paranoid checks, i.e., allow
whatever the inputs from userspace.

It could use kvm_cpu_cap_init_mf() to use 0xFFFFFFFF to allow all bits
for a 32-bit CPUID output register, however, it would require to add an
entry in enum kvm_only_cpuid_leafs and reverse_cpuid[], which brings
more COLs.

Each ignored entry specifies a CPUID function, an inclusive index range
(with index_end=-1 meaning "all sub-leaves starting from the
index_start"), a bitmask of registers (EAX/EBX/ECX/EDX), and an overlay
mask to scope the exemption to specific VM types.

KVM_MAX_CPUID_ENTRIES may be a bit oversized, but since it's global, the
waste of memory should be acceptable.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 3bd9608770a9..e633707277f9 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -45,7 +45,21 @@ struct cpuid_xstate_sizes {
 	u32 ecx;
 };
 
+struct ignored_entry {
+	u32 func;
+	u32 index_start;
+	u32 index_end;
+	u32 reg_mask;
+	u32 overlay_mask;
+};
+
+struct cpuid_paranoid_ignored_set {
+	u32 nr;
+	struct ignored_entry entries[KVM_MAX_CPUID_ENTRIES];
+};
+
 static struct cpuid_xstate_sizes xstate_sizes[XFEATURE_MAX] __ro_after_init;
+static struct cpuid_paranoid_ignored_set ignored_set __read_mostly;
 
 void __init kvm_init_xstate_sizes(void)
 {
@@ -372,6 +386,39 @@ static u32 cpuid_get_reg_unsafe(struct kvm_cpuid_entry2 *entry, u32 reg)
 static int cpuid_func_emulated(struct kvm *kvm, struct kvm_cpuid_entry2 *entry,
 			       u32 func, bool include_partially_emulated);
 
+/*
+ * index_start and index_end are inclusive:
+ * - Use 0 for both index_start and index_end if the function is not indexed.
+ * - Use -1 as index_end to indicate open-ended index ranges starting from
+ *   index_start.
+ */
+static void __maybe_unused kvm_cpu_cap_ignore(u32 func, u32 index_start, u32 index_end,
+					      u32 reg_mask, u32 overlay_mask)
+{
+	if (WARN_ON_ONCE(ignored_set.nr >= KVM_MAX_CPUID_ENTRIES))
+		return;
+
+	ignored_set.entries[ignored_set.nr].func = func;
+	ignored_set.entries[ignored_set.nr].index_start = index_start;
+	ignored_set.entries[ignored_set.nr].index_end = index_end;
+	ignored_set.entries[ignored_set.nr].reg_mask = reg_mask;
+	ignored_set.entries[ignored_set.nr].overlay_mask = overlay_mask;
+	ignored_set.nr++;
+}
+
+static bool __maybe_unused is_cpuid_paranoid_ignored(u32 func, u32 index, int reg, u8 overlay)
+{
+	for (int i = 0; i < ignored_set.nr; i++) {
+		struct ignored_entry *e = &ignored_set.entries[i];
+
+		if ((e->func == func) && (e->reg_mask & BIT(reg)) &&
+		    (e->overlay_mask & BIT(overlay)) &&
+		    (index >= e->index_start && index <= e->index_end))
+			return true;
+	}
+	return false;
+}
+
 void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	u8 cpuid_overlay = get_cpuid_overlay(vcpu->kvm);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 16/27] KVM: x86: Init allowed masks for basic CPUID range in paranoid mode
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (14 preceding siblings ...)
  2026-04-17  7:35 ` [RFC PATCH 15/27] KVM: x86: Add infrastructure to track CPUID entries ignored in paranoid mode Binbin Wu
@ 2026-04-17  7:35 ` Binbin Wu
  2026-04-17  7:36 ` [RFC PATCH 17/27] KVM: x86: Init allowed masks for extended " Binbin Wu
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:35 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Populate the CPUID paranoid mode validation data for the basic CPUID
range (0x0 through 0x24).

For each CPUID output register, the validation follows one of three
rules:
  1. Ignored: the register is added to the ignored set and KVM skips
     validation of the userspace-provided value.
  2. Mask/value check: a new KVM-only CPUID leaf enum is defined with a
     corresponding reverse_cpuid[] entry, and an allowed mask or fixed
     value is initialized per-overlay.
  3. Zero check: for reserved registers or registers where no bits are
     supported, userspace input is checked against zero.

Add is_cpuid_subleaf_common_pattern() to map higher sub-leaf indices to
a representative sub-leaf for validation, avoiding duplicate mask
definitions for CPUID functions 4, 0xB, 0xD, 0x12, and 0x1F.

Add is_cpuid_reg_check_value() to flag registers where userspace input
must exactly match fixed values (CPUID 0x1D, 0x1E.0.EBX) rather than
being validated against a bitmask.

Notable leaf-specific handling:
 - CPUID 0x1.EDX: HT is emulated to allow userspace to set it, but
   masked when reporting supported CPUID to userspace.
 - CPUID 0x6.EAX: ARAT initialized as emulated, replacing the hardcoded
   value in __do_cpuid_func().
 - CPUID 0x6.ECX: APERFMPERF allowed for VMX/SVM (userspace may enable
   KVM_X86_DISABLE_EXITS_APERFMPERF and set it), fixed-0 for TDX.
 - CPUID 0x7.0.EDX: CORE_CAPABILITIES set for TDX to accommodate old
   TDX modules that report bit 30 as fixed-1; MSR_IA32_CORE_CAPS reads
   return 0 inside a TD as a workaround.
 - CPUID 0xD: XCR0-based masks for subleaf 0, XSS-based for subleaf 1;
   size/offset fields are ignored
 - CPUID 0x12: SGX sub-leaf masks initialized when SGX is supported,
   replacing hardcoded masks in __do_cpuid_func().
 - CPUID 0x14: PT masks initialized for VMX from intel pt_caps[].
   Override CPUID.0x14.0.{EBX, ECX} when reporting capabilities to
   userspace.
 - CPUID 0x1D: fixed values from Intel SDM, exact-match required.
 - CPUID 0x1E.0: EAX capped at 1 (max supported sub-leaf), EBX is a
   fixed value from Intel SDM.
 - CPUID 0x24: AVX10 version capped at 2, merged with vector-width bits.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  45 +++++++-
 arch/x86/kvm/cpuid.c            | 186 +++++++++++++++++++++++++++++---
 arch/x86/kvm/reverse_cpuid.h    |  47 ++++++++
 arch/x86/kvm/vmx/tdx.c          |  10 +-
 arch/x86/kvm/vmx/vmx.c          |  14 +++
 5 files changed, 288 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 75895ab569fb..90514791f0fd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -794,7 +794,50 @@ enum kvm_only_cpuid_leafs {
 	CPUID_24_1_ECX,
 	/* End of the leafs tracked by per-vcpu caps. */
 	NR_KVM_CPU_CAPS,
-	NR_KVM_CPU_CAPS_PARANOID = NR_KVM_CPU_CAPS,
+	CPUID_1_EAX = NR_KVM_CPU_CAPS,
+	CPUID_2_EAX,
+	CPUID_4_0_EAX,
+	CPUID_4_0_EDX,
+	CPUID_5_EAX,
+	CPUID_5_EBX,
+	CPUID_5_ECX,
+	CPUID_6_ECX,
+	CPUID_A_EAX,
+	CPUID_A_EBX,
+	CPUID_A_ECX,
+	CPUID_A_EDX,
+	CPUID_B_0_EAX,
+	CPUID_B_0_EBX,
+	CPUID_B_0_ECX,
+	CPUID_D_0_EAX,
+	CPUID_D_0_EDX,
+	CPUID_D_1_ECX,
+	CPUID_D_2_ECX,
+	CPUID_12_0_EBX,
+	CPUID_12_0_EDX,
+	CPUID_12_1_EAX,
+	CPUID_12_1_ECX,
+	CPUID_12_1_EDX,
+	CPUID_12_2_EAX,
+	CPUID_12_2_EBX,
+	CPUID_12_2_ECX,
+	CPUID_12_2_EDX,
+	CPUID_14_0_EAX,
+	CPUID_14_0_EBX,
+	CPUID_14_0_ECX,
+	CPUID_14_1_EAX,
+	CPUID_14_1_EBX,
+	CPUID_1D_0_EAX,
+	CPUID_1D_1_EAX,
+	CPUID_1D_1_EBX,
+	CPUID_1D_1_ECX,
+	CPUID_1E_0_EAX,
+	CPUID_1E_0_EBX,
+	CPUID_1F_0_EAX,
+	CPUID_1F_0_EBX,
+	CPUID_1F_0_ECX,
+	CPUID_24_0_EAX,
+	NR_KVM_CPU_CAPS_PARANOID,
 
 	NKVMCAPINTS = NR_KVM_CPU_CAPS_PARANOID - NCAPINTS,
 };
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e633707277f9..59f0b3166eaa 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -392,8 +392,8 @@ static int cpuid_func_emulated(struct kvm *kvm, struct kvm_cpuid_entry2 *entry,
  * - Use -1 as index_end to indicate open-ended index ranges starting from
  *   index_start.
  */
-static void __maybe_unused kvm_cpu_cap_ignore(u32 func, u32 index_start, u32 index_end,
-					      u32 reg_mask, u32 overlay_mask)
+static void kvm_cpu_cap_ignore(u32 func, u32 index_start, u32 index_end,
+			       u32 reg_mask, u32 overlay_mask)
 {
 	if (WARN_ON_ONCE(ignored_set.nr >= KVM_MAX_CPUID_ENTRIES))
 		return;
@@ -419,6 +419,35 @@ static bool __maybe_unused is_cpuid_paranoid_ignored(u32 func, u32 index, int re
 	return false;
 }
 
+static bool __maybe_unused is_cpuid_reg_check_value(u32 func, u32 index, int reg)
+{
+	switch (func) {
+	case 0x1D: return true;
+	case 0x1E: return index == 0 && reg == CPUID_EBX;
+	default: return false;
+	}
+}
+
+static bool __maybe_unused is_cpuid_subleaf_common_pattern(u32 func, u32 *index)
+{
+	switch (func) {
+	case 4:
+	case 0xB:
+	case 0x1F:
+		*index = 0;
+		return true;
+	case 0xD:
+	case 0x12:
+		if (*index >= 2) {
+			*index = 2;
+			return true;
+		}
+		return false;
+	default:
+		return false;
+	}
+}
+
 void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	u8 cpuid_overlay = get_cpuid_overlay(vcpu->kvm);
@@ -876,6 +905,14 @@ void kvm_initialize_cpu_caps(void)
 	BUILD_BUG_ON(sizeof(kvm_cpu_caps)/NR_CPUID_OL - (NKVMCAPINTS * sizeof(**kvm_cpu_caps)) >
 		     sizeof(boot_cpu_data.x86_capability));
 
+	kvm_cpu_cap_ignore(0, 0, 0,
+			   BIT(CPUID_EAX) | BIT(CPUID_EBX) | BIT(CPUID_ECX) | BIT(CPUID_EDX),
+			   F_CPUID_DEFAULT | F_CPUID_TDX);
+
+	kvm_cpu_cap_init_mf(CPUID_1_EAX, GENMASK_U32(27, 16) | GENMASK_U32(13, 0),
+			    F_CPUID_DEFAULT | F_CPUID_TDX);
+	kvm_cpu_cap_ignore(1, 0, 0, BIT(CPUID_EBX), F_CPUID_DEFAULT | F_CPUID_TDX);
+
 	kvm_cpu_cap_init(CPUID_1_ECX,
 		F(XMM3, F_CPUID_DEFAULT | F_CPUID_TDX),
 		F(PCLMULQDQ, F_CPUID_DEFAULT | F_CPUID_TDX),
@@ -946,9 +983,40 @@ void kvm_initialize_cpu_caps(void)
 		F(XMM, F_CPUID_DEFAULT | F_CPUID_TDX),
 		F(XMM2, F_CPUID_DEFAULT | F_CPUID_TDX),
 		F(SELFSNOOP, F_CPUID_DEFAULT | F_CPUID_TDX),
-		/* HTT, TM, Reserved, PBE */
+		/* Allow userspace to set HT regardless of underlying hardware. */
+		EMULATED_F(HT, F_CPUID_DEFAULT | F_CPUID_TDX),
+		/* TM, Reserved, PBE */
+	);
+
+	/* EAX[7:0] are reserved with value 1. */
+	kvm_cpu_cap_init_mf(CPUID_2_EAX, GENMASK_U32(31, 8) | 0x01, F_CPUID_VMX | F_CPUID_TDX);
+	kvm_cpu_cap_ignore(2, 0, 0, BIT(CPUID_EBX) | BIT(CPUID_ECX) | BIT(CPUID_EDX),
+			   F_CPUID_VMX | F_CPUID_TDX);
+
+	kvm_cpu_cap_init_mf(CPUID_4_0_EAX, ~GENMASK_U32(13, 10), F_CPUID_VMX | F_CPUID_TDX);
+	kvm_cpu_cap_init_mf(CPUID_4_0_EDX, GENMASK_U32(2, 0), F_CPUID_VMX | F_CPUID_TDX);
+	kvm_cpu_cap_ignore(4, 0, -1, BIT(CPUID_EBX) | BIT(CPUID_ECX), F_CPUID_VMX | F_CPUID_TDX);
+
+	kvm_cpu_cap_init_mf(CPUID_5_EAX, GENMASK_U32(15, 0), F_CPUID_DEFAULT | F_CPUID_TDX);
+	kvm_cpu_cap_init_mf(CPUID_5_EBX, GENMASK_U32(15, 0), F_CPUID_DEFAULT | F_CPUID_TDX);
+	kvm_cpu_cap_init_mf(CPUID_5_ECX, GENMASK_U32(1, 0), F_CPUID_DEFAULT | F_CPUID_TDX);
+	kvm_cpu_cap_ignore(5, 0, 0, BIT(CPUID_EDX), F_CPUID_VMX | F_CPUID_TDX);
+
+	kvm_cpu_cap_init(CPUID_6_EAX,
+		EMULATED_F(ARAT, F_CPUID_DEFAULT | F_CPUID_TDX),
+	);
+
+	/*
+	 * KVM allows userspace to set APERFMPERF after enabling
+	 * KVM_X86_DISABLE_EXITS_APERFMPERF.
+	 * Fixed-0 for TDX.
+	 */
+	kvm_cpu_cap_init(CPUID_6_ECX,
+		F(APERFMPERF, F_CPUID_DEFAULT),
 	);
 
+	kvm_cpu_cap_ignore(7, 0, 0, BIT(CPUID_EAX), F_CPUID_DEFAULT | F_CPUID_TDX);
+
 	kvm_cpu_cap_init(CPUID_7_0_EBX,
 		F(FSGSBASE, F_CPUID_DEFAULT | F_CPUID_TDX),
 		EMULATED_F(TSC_ADJUST, F_CPUID_DEFAULT | F_CPUID_TDX),
@@ -1056,7 +1124,7 @@ void kvm_initialize_cpu_caps(void)
 		F(INTEL_STIBP, F_CPUID_DEFAULT | F_CPUID_TDX),
 		F(FLUSH_L1D, F_CPUID_DEFAULT | F_CPUID_TDX),
 		EMULATED_F(ARCH_CAPABILITIES, F_CPUID_DEFAULT | F_CPUID_TDX),
-		/* CORE_CAPABILITIES */
+		F(CORE_CAPABILITIES, F_CPUID_TDX),
 		F(SPEC_CTRL_SSBD, F_CPUID_DEFAULT | F_CPUID_TDX),
 	);
 
@@ -1120,6 +1188,30 @@ void kvm_initialize_cpu_caps(void)
 		F(MCDT_NO, F_CPUID_DEFAULT | F_CPUID_TDX),
 	);
 
+	if (enable_pmu) {
+		/* KVM doesn't support PERFMON for TDX yet. */
+		kvm_cpu_cap_init_mf(CPUID_A_EAX, GENMASK_U32(31, 0), F_CPUID_VMX);
+		kvm_cpu_cap_init_mf(CPUID_A_EBX, GENMASK_U32(12, 10) | GENMASK_U32(7, 0),
+				    F_CPUID_VMX);
+		kvm_cpu_cap_init_mf(CPUID_A_ECX, GENMASK_U32(31, 0), F_CPUID_VMX);
+		kvm_cpu_cap_init_mf(CPUID_A_EDX, GENMASK_U32(19, 15) | GENMASK_U32(12, 0),
+				    F_CPUID_VMX);
+	}
+
+	/* CPUID 0xB is derived from CPUID.0x1F for TDX, but allow userspace to set it. */
+	kvm_cpu_cap_init_mf(CPUID_B_0_EAX, GENMASK_U32(4, 0), F_CPUID_DEFAULT | F_CPUID_TDX);
+	kvm_cpu_cap_init_mf(CPUID_B_0_EBX, GENMASK_U32(15, 0), F_CPUID_DEFAULT | F_CPUID_TDX);
+	kvm_cpu_cap_init_mf(CPUID_B_0_ECX, GENMASK_U32(15, 0), F_CPUID_DEFAULT | F_CPUID_TDX);
+	kvm_cpu_cap_ignore(0xB, 0, -1, BIT(CPUID_EDX), F_CPUID_DEFAULT | F_CPUID_TDX);
+
+
+	kvm_cpu_cap_init_mf(CPUID_D_0_EAX, (u32)kvm_caps.supported_xcr0,
+			    F_CPUID_DEFAULT | F_CPUID_TDX);
+	kvm_cpu_cap_ignore(0xD, 0, 0, BIT(CPUID_EBX) | BIT(CPUID_ECX),
+			   F_CPUID_DEFAULT | F_CPUID_TDX);
+	kvm_cpu_cap_init_mf(CPUID_D_0_EDX, (u32)(kvm_caps.supported_xcr0 >> 32),
+			    F_CPUID_DEFAULT | F_CPUID_TDX);
+
 	kvm_cpu_cap_init(CPUID_D_1_EAX,
 		F(XSAVEOPT, F_CPUID_DEFAULT | F_CPUID_TDX),
 		F(XSAVEC, F_CPUID_DEFAULT | F_CPUID_TDX),
@@ -1128,6 +1220,19 @@ void kvm_initialize_cpu_caps(void)
 		X86_64_F(XFD, F_CPUID_DEFAULT | F_CPUID_TDX),
 	);
 
+	kvm_cpu_cap_ignore(0xD, 1, 1, BIT(CPUID_EBX), F_CPUID_DEFAULT | F_CPUID_TDX);
+
+	/* No bits are defined in CPUID.D.1.EDX (i.e., the upper 32 bits of XSS) yet. */
+	kvm_cpu_cap_init_mf(CPUID_D_1_ECX, (u32)kvm_caps.supported_xss,
+			    F_CPUID_DEFAULT | F_CPUID_TDX);
+
+	if ((kvm_caps.supported_xss | kvm_caps.supported_xcr0) & GENMASK_U64(62, 2)) {
+		kvm_cpu_cap_ignore(0xD, 2, 62, BIT(CPUID_EAX) | BIT(CPUID_EBX),
+				   F_CPUID_DEFAULT | F_CPUID_TDX);
+		kvm_cpu_cap_init_mf(CPUID_D_2_ECX, GENMASK_U32(2, 0),
+				    F_CPUID_DEFAULT | F_CPUID_TDX);
+	}
+
 	/* SGX related features are fixed-0 for TDX */
 	kvm_cpu_cap_init(CPUID_12_EAX,
 		SCATTERED_F(SGX1, F_CPUID_DEFAULT),
@@ -1135,6 +1240,40 @@ void kvm_initialize_cpu_caps(void)
 		SCATTERED_F(SGX_EDECCSSA, F_CPUID_DEFAULT),
 	);
 
+	if (kvm_cpu_cap_has(NULL, X86_FEATURE_SGX)) {
+		kvm_cpu_cap_check_and_init_mf(CPUID_12_0_EBX, SGX_MISC_EXINFO, F_CPUID_DEFAULT);
+		kvm_cpu_cap_init_mf(CPUID_12_0_EDX, GENMASK_U32(15, 0), F_CPUID_DEFAULT);
+
+		kvm_cpu_cap_check_and_init_mf(CPUID_12_1_EAX,
+					      SGX_ATTR_PRIV_MASK | SGX_ATTR_UNPRIV_MASK,
+					      F_CPUID_DEFAULT);
+		kvm_cpu_cap_init_mf(CPUID_12_1_ECX, (u32)kvm_caps.supported_xcr0, F_CPUID_DEFAULT);
+		kvm_cpu_cap_init_mf(CPUID_12_1_EDX, (u32)(kvm_caps.supported_xcr0 >> 32),
+				    F_CPUID_DEFAULT);
+
+		/*
+		 * SUB_LEAF_TYPE (EAX[3:0]) is valid only when it is 1. The
+		 * masks are initialized according to type 1.
+		 */
+		kvm_cpu_cap_init_mf(CPUID_12_2_EAX, GENMASK_U32(31, 12) | 0x1, F_CPUID_DEFAULT);
+		kvm_cpu_cap_init_mf(CPUID_12_2_EBX, GENMASK_U32(19, 0), F_CPUID_DEFAULT);
+		kvm_cpu_cap_init_mf(CPUID_12_2_ECX, ~GENMASK_U32(11, 4), F_CPUID_DEFAULT);
+		kvm_cpu_cap_init_mf(CPUID_12_2_EDX, GENMASK_U32(19, 0), F_CPUID_DEFAULT);
+	}
+
+	/* Hardcoded with fixed values in Intel SDM. */
+	if (kvm_cpu_cap_has(NULL, X86_FEATURE_AMX_TILE)) {
+		kvm_cpu_cap_init_mf(CPUID_1D_0_EAX, 0x00000001, F_CPUID_DEFAULT | F_CPUID_TDX);
+
+		kvm_cpu_cap_init_mf(CPUID_1D_1_EAX, 0x04002000, F_CPUID_DEFAULT | F_CPUID_TDX);
+		kvm_cpu_cap_init_mf(CPUID_1D_1_EBX, 0x00080040, F_CPUID_DEFAULT | F_CPUID_TDX);
+		kvm_cpu_cap_init_mf(CPUID_1D_1_ECX, 0x00000010, F_CPUID_DEFAULT | F_CPUID_TDX);
+
+		/* KVM limits the subleaf up to 1. */
+		kvm_cpu_cap_init_mf(CPUID_1E_0_EAX, 0x00000001, F_CPUID_DEFAULT | F_CPUID_TDX);
+		kvm_cpu_cap_init_mf(CPUID_1E_0_EBX, 0x00004010, F_CPUID_DEFAULT | F_CPUID_TDX);
+	}
+
 	kvm_cpu_cap_init(CPUID_1E_1_EAX,
 		F(AMX_INT8_ALIAS, F_CPUID_DEFAULT | F_CPUID_TDX),
 		F(AMX_BF16_ALIAS, F_CPUID_DEFAULT | F_CPUID_TDX),
@@ -1146,11 +1285,25 @@ void kvm_initialize_cpu_caps(void)
 		F(AMX_MOVRS, F_CPUID_DEFAULT | F_CPUID_TDX),
 	);
 
-	kvm_cpu_cap_init(CPUID_24_0_EBX,
-		F(AVX10_128, F_CPUID_DEFAULT | F_CPUID_TDX),
-		F(AVX10_256, F_CPUID_DEFAULT | F_CPUID_TDX),
-		F(AVX10_512, F_CPUID_DEFAULT | F_CPUID_TDX),
-	);
+	kvm_cpu_cap_init_mf(CPUID_1F_0_EAX, GENMASK_U32(4, 0), F_CPUID_VMX | F_CPUID_TDX);
+	kvm_cpu_cap_init_mf(CPUID_1F_0_EBX, GENMASK_U32(15, 0), F_CPUID_VMX | F_CPUID_TDX);
+	kvm_cpu_cap_init_mf(CPUID_1F_0_ECX, GENMASK_U32(15, 0), F_CPUID_VMX | F_CPUID_TDX);
+	kvm_cpu_cap_ignore(0x1F, 0, -1, BIT(CPUID_EDX), F_CPUID_VMX | F_CPUID_TDX);
+
+	if (kvm_cpu_cap_has(NULL, X86_FEATURE_AVX10)) {
+		/* KVM supports up to subleaf 1 */
+		kvm_cpu_cap_init_mf(CPUID_24_0_EAX, 0x00000001, F_CPUID_DEFAULT | F_CPUID_TDX);
+		/*
+		 * The allowed value for AVX10 version is 1 or 2. The version is
+		 * guaranteed to be >=1 if AVX10 is supported, and KVM supports
+		 * up to version 2. For simplicity, just allow lower 2 bits to
+		 * be set by userspace.
+		 * EBX[18:16] is reserved at 111b for all vector widths, i.e.,
+		 * AVX10_128, AVX10_256, and AVX10_512.
+		 */
+		kvm_cpu_cap_init_mf(CPUID_24_0_EBX, GENMASK_U32(18, 16) | GENMASK_U32(1, 0),
+				    F_CPUID_DEFAULT | F_CPUID_TDX);
+	}
 
 	kvm_cpu_cap_init(CPUID_24_1_ECX,
 		/* AVX10_VNNI_INT is reserved in TDX spec */
@@ -1501,6 +1654,11 @@ static inline int __do_cpuid_func(struct kvm *kvm, struct kvm_cpuid_array *array
 		break;
 	case 1:
 		cpuid_entry_override(kvm, entry, CPUID_1_EDX);
+		/*
+		 * Clear HT when reporting to userspace since it's not emulated
+		 * by KVM.
+		 */
+		entry->edx &= ~feature_bit(HT);
 		cpuid_entry_override(kvm, entry, CPUID_1_ECX);
 		break;
 	case 2:
@@ -1535,7 +1693,7 @@ static inline int __do_cpuid_func(struct kvm *kvm, struct kvm_cpuid_array *array
 		}
 		break;
 	case 6: /* Thermal management */
-		entry->eax = 0x4; /* allow ARAT */
+		cpuid_entry_override(kvm, entry, CPUID_6_EAX);
 		entry->ebx = 0;
 		entry->ecx = 0;
 		entry->edx = 0;
@@ -1674,7 +1832,7 @@ static inline int __do_cpuid_func(struct kvm *kvm, struct kvm_cpuid_array *array
 		 * feature flags), while enclave size is unrestricted.
 		 */
 		cpuid_entry_override(kvm, entry, CPUID_12_EAX);
-		entry->ebx &= SGX_MISC_EXINFO;
+		cpuid_entry_override(kvm, entry, CPUID_12_0_EBX);
 
 		entry = do_host_cpuid(array, function, 1);
 		if (!entry)
@@ -1687,7 +1845,7 @@ static inline int __do_cpuid_func(struct kvm *kvm, struct kvm_cpuid_array *array
 		 * userspace.  ATTRIBUTES.XFRM is not adjusted as userspace is
 		 * expected to derive it from supported XCR0.
 		 */
-		entry->eax &= SGX_ATTR_PRIV_MASK | SGX_ATTR_UNPRIV_MASK;
+		cpuid_entry_override(kvm, entry, CPUID_12_1_EAX);
 		entry->ebx &= 0;
 		break;
 	/* Intel PT */
@@ -1697,6 +1855,9 @@ static inline int __do_cpuid_func(struct kvm *kvm, struct kvm_cpuid_array *array
 			break;
 		}
 
+		cpuid_entry_override(kvm, entry, CPUID_14_0_EBX);
+		cpuid_entry_override(kvm, entry, CPUID_14_0_ECX);
+
 		for (i = 1, max_idx = entry->eax; i <= max_idx; ++i) {
 			if (!do_host_cpuid(array, function, i))
 				goto out;
@@ -1750,6 +1911,7 @@ static inline int __do_cpuid_func(struct kvm *kvm, struct kvm_cpuid_array *array
 		 */
 		avx10_version = min_t(u8, entry->ebx & 0xff, 2);
 		cpuid_entry_override(kvm, entry, CPUID_24_0_EBX);
+		entry->ebx &= ~GENMASK_U32(7, 0);
 		entry->ebx |= avx10_version;
 
 		entry->ecx = 0;
diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
index 657f5f743ed9..5c7c0fbb0fec 100644
--- a/arch/x86/kvm/reverse_cpuid.h
+++ b/arch/x86/kvm/reverse_cpuid.h
@@ -76,6 +76,9 @@
 #define KVM_X86_FEATURE_TSA_SQ_NO	KVM_X86_FEATURE(CPUID_8000_0021_ECX, 1)
 #define KVM_X86_FEATURE_TSA_L1_NO	KVM_X86_FEATURE(CPUID_8000_0021_ECX, 2)
 
+/* CPUID level 0x6 (ECX) */
+#define KVM_X86_FEATURE_APERFMPERF	KVM_X86_FEATURE(CPUID_6_ECX, 0)
+
 struct cpuid_reg {
 	u32 function;
 	u32 index;
@@ -109,6 +112,49 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_7_1_ECX]       = {         7, 1, CPUID_ECX},
 	[CPUID_1E_1_EAX]      = {      0x1e, 1, CPUID_EAX},
 	[CPUID_24_1_ECX]      = {      0x24, 1, CPUID_ECX},
+	[CPUID_1_EAX]         = {         1, 0, CPUID_EAX},
+	[CPUID_2_EAX]         = {         2, 0, CPUID_EAX},
+	[CPUID_4_0_EAX]       = {         4, 0, CPUID_EAX},
+	[CPUID_4_0_EDX]       = {         4, 0, CPUID_EDX},
+	[CPUID_5_EAX]         = {         5, 0, CPUID_EAX},
+	[CPUID_5_EBX]         = {         5, 0, CPUID_EBX},
+	[CPUID_5_ECX]         = {         5, 0, CPUID_ECX},
+	[CPUID_6_ECX]         = {         6, 0, CPUID_ECX},
+	[CPUID_A_EAX]         = {       0xa, 0, CPUID_EAX},
+	[CPUID_A_EBX]         = {       0xa, 0, CPUID_EBX},
+	[CPUID_A_ECX]         = {       0xa, 0, CPUID_ECX},
+	[CPUID_A_EDX]         = {       0xa, 0, CPUID_EDX},
+	[CPUID_B_0_EAX]       = {       0xb, 0, CPUID_EAX},
+	[CPUID_B_0_EBX]       = {       0xb, 0, CPUID_EBX},
+	[CPUID_B_0_ECX]       = {       0xb, 0, CPUID_ECX},
+	[CPUID_D_0_EAX]       = {       0xd, 0, CPUID_EAX},
+	[CPUID_D_0_EDX]       = {       0xd, 0, CPUID_EDX},
+	[CPUID_D_1_ECX]       = {       0xd, 1, CPUID_ECX},
+	[CPUID_D_2_ECX]       = {       0xd, 2, CPUID_ECX},
+	[CPUID_12_0_EBX]      = {      0x12, 0, CPUID_EBX},
+	[CPUID_12_0_EDX]      = {      0x12, 0, CPUID_EDX},
+	[CPUID_12_1_EAX]      = {      0x12, 1, CPUID_EAX},
+	[CPUID_12_1_ECX]      = {      0x12, 1, CPUID_ECX},
+	[CPUID_12_1_EDX]      = {      0x12, 1, CPUID_EDX},
+	[CPUID_12_2_EAX]      = {      0x12, 2, CPUID_EAX},
+	[CPUID_12_2_EBX]      = {      0x12, 2, CPUID_EBX},
+	[CPUID_12_2_ECX]      = {      0x12, 2, CPUID_ECX},
+	[CPUID_12_2_EDX]      = {      0x12, 2, CPUID_EDX},
+	[CPUID_14_0_EAX]      = {      0x14, 0, CPUID_EAX},
+	[CPUID_14_0_EBX]      = {      0x14, 0, CPUID_EBX},
+	[CPUID_14_0_ECX]      = {      0x14, 0, CPUID_ECX},
+	[CPUID_14_1_EAX]      = {      0x14, 1, CPUID_EAX},
+	[CPUID_14_1_EBX]      = {      0x14, 1, CPUID_EBX},
+	[CPUID_1D_0_EAX]      = {      0x1d, 0, CPUID_EAX},
+	[CPUID_1D_1_EAX]      = {      0x1d, 1, CPUID_EAX},
+	[CPUID_1D_1_EBX]      = {      0x1d, 1, CPUID_EBX},
+	[CPUID_1D_1_ECX]      = {      0x1d, 1, CPUID_ECX},
+	[CPUID_1E_0_EAX]      = {      0x1e, 0, CPUID_EAX},
+	[CPUID_1E_0_EBX]      = {      0x1e, 0, CPUID_EBX},
+	[CPUID_1F_0_EAX]      = {      0x1f, 0, CPUID_EAX},
+	[CPUID_1F_0_EBX]      = {      0x1f, 0, CPUID_EBX},
+	[CPUID_1F_0_ECX]      = {      0x1f, 0, CPUID_ECX},
+	[CPUID_24_0_EAX]      = {      0x24, 0, CPUID_EAX},
 };
 
 /*
@@ -151,6 +197,7 @@ static __always_inline u32 __feature_translate(int x86_feature)
 	KVM_X86_TRANSLATE_FEATURE(TSA_SQ_NO);
 	KVM_X86_TRANSLATE_FEATURE(TSA_L1_NO);
 	KVM_X86_TRANSLATE_FEATURE(MSR_IMM);
+	KVM_X86_TRANSLATE_FEATURE(APERFMPERF);
 	default:
 		return x86_feature;
 	}
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 1e47c194af53..a1df89d66a84 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2141,7 +2141,7 @@ bool tdx_has_emulated_msr(u32 index)
 static bool tdx_is_read_only_msr(u32 index)
 {
 	return  index == MSR_IA32_APICBASE || index == MSR_EFER ||
-		index == MSR_IA32_FEAT_CTL;
+		index == MSR_IA32_FEAT_CTL || index == MSR_IA32_CORE_CAPS;
 }
 
 int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
@@ -2161,6 +2161,14 @@ int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 			return 1;
 		msr->data = vcpu->arch.mcg_ext_ctl;
 		return 0;
+	case MSR_IA32_CORE_CAPS:
+		/*
+		 * KVM doesn't support MSR_IA32_CORE_CAPS, however, in some old
+		 * TDX modules, CPUID.0x7.0.EDX[30] is fixed-1. As a workaround,
+		 * just return 0 for this MSR.
+		 */
+		msr->data = 0;
+		return 0;
 	default:
 		if (!tdx_has_emulated_msr(msr->index))
 			return 1;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f772558758f7..17c9048c87f3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8099,6 +8099,20 @@ static __init void vmx_set_cpu_caps(void)
 	if (vmx_pt_mode_is_host_guest())
 		kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT, F_CPUID_VMX);
 
+	if (kvm_cpu_cap_has(NULL, X86_FEATURE_INTEL_PT)) {
+		kvm_cpu_cap_init_mf(CPUID_14_0_EAX, GENMASK_U32(31, 0), F_CPUID_VMX);
+		/* Lower 9 bits are defined, however, bit 6 is not supported in intel pt_caps[] */
+		kvm_cpu_cap_check_and_init_mf(CPUID_14_0_EBX,
+					      GENMASK_U32(8, 7) | GENMASK_U32(5, 0),
+					      F_CPUID_VMX);
+		kvm_cpu_cap_check_and_init_mf(CPUID_14_0_ECX,
+					      BIT(31) | GENMASK_U32(3, 0),
+					      F_CPUID_VMX);
+
+		kvm_cpu_cap_init_mf(CPUID_14_1_EAX, ~GENMASK_U32(15, 3), F_CPUID_VMX);
+		kvm_cpu_cap_init_mf(CPUID_14_1_EBX, GENMASK_U32(31, 0), F_CPUID_VMX);
+	}
+
 	/* DS and DTES64 are fixed-1 for TDX */
 	enable_mask = vmx_pebs_supported() ? F_CPUID_TDX | F_CPUID_VMX : F_CPUID_TDX;
 	kvm_cpu_cap_check_and_set(X86_FEATURE_DS, enable_mask);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 17/27] KVM: x86: Init allowed masks for extended CPUID range in paranoid mode
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (15 preceding siblings ...)
  2026-04-17  7:35 ` [RFC PATCH 16/27] KVM: x86: Init allowed masks for basic CPUID range " Binbin Wu
@ 2026-04-17  7:36 ` Binbin Wu
  2026-04-17  7:36 ` [RFC PATCH 18/27] KVM: x86: Handle Centaur CPUID leafs " Binbin Wu
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:36 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Populate the CPUID paranoid mode validation data for the extended CPUID
range (0x80000000 through 0x80000022).

As with the basic range, each register follows one of three rules: ignored
(skipped during validation), mask/value checked, or zero checked (for
reserved or unsupported registers).

Most added extended range leaves are AMD-specific and are initialized
for the SVM overlay only.  A few registers (max extended leaf, brand
string, cache line size) are relevant to all overlays but are ignored
since KVM doesn't meaningfully constrain them.

Notable leaf-specific handling:
 - 0x80000000: EAX (max extended leaf) ignored for all overlays.
   EBX/ECX/EDX (vendor string) ignored for SVM only.
 - 0x80000001: EAX ignored for all overlays — reserved on Intel but
   userspace may set it.  EBX allowed mask initialized for SVM.
 - 0x80000002–0x80000004: brand string, all registers ignored.
 - 0x80000005: L1 cache/TLB info, ignored for SVM only.
 - 0x80000006: cache info — EAX/EBX ignored for SVM, ECX ignored for
   all overlays, EDX allowed mask for SVM.
 - 0x80000008: EAX allowed bits 23:0 (phys/virt address widths) for
   all overlays.  ECX (core count/APIC ID size) for SVM only.
 - 0x8000000A: SVM revision (EAX) allowed if SVM is supported.
   ASID count (EBX) is ignored.
 - 0x8000001A: performance optimization identifiers, intersected with
   raw host CPUID.
 - 0x8000001D: AMD cache topology (analogous to CPUID 4), with
   sub-leaf common pattern mapping added.
 - 0x8000001E: extended topology, initialized when TOPOEXT is
   supported.
 - 0x8000001F: EBX allows bits 11:0 only (excludes VMPL fields).
 - 0x80000021: EBX allows the ERAPS size field (bits 23:16) when
   ERAPS is supported.
 - 0x80000022: EBX allows core performance counter count (bits 3:0)
   when PERFMON_V2 is supported and PMU is enabled.
- 0x80000000: EAX (max extended leaf) is ignored for all overlays, it
  could be checked in the future if needed. EBX/ECX/EDX are ignored for
  SVM only.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h | 13 +++++++
 arch/x86/kvm/cpuid.c            | 68 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/reverse_cpuid.h    | 13 +++++++
 3 files changed, 94 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 90514791f0fd..2ec4d92e3e79 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -837,6 +837,19 @@ enum kvm_only_cpuid_leafs {
 	CPUID_1F_0_EBX,
 	CPUID_1F_0_ECX,
 	CPUID_24_0_EAX,
+	CPUID_8000_0001_EBX,
+	CPUID_8000_0006_EDX,
+	CPUID_8000_0008_EAX,
+	CPUID_8000_0008_ECX,
+	CPUID_8000_000A_EAX,
+	CPUID_8000_001A_EAX,
+	CPUID_8000_001D_EAX,
+	CPUID_8000_001D_EDX,
+	CPUID_8000_001E_EBX,
+	CPUID_8000_001E_ECX,
+	CPUID_8000_001F_EBX,
+	CPUID_8000_0021_EBX,
+	CPUID_8000_0022_EBX,
 	NR_KVM_CPU_CAPS_PARANOID,
 
 	NKVMCAPINTS = NR_KVM_CPU_CAPS_PARANOID - NCAPINTS,
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 59f0b3166eaa..471733eb68d8 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -434,6 +434,7 @@ static bool __maybe_unused is_cpuid_subleaf_common_pattern(u32 func, u32 *index)
 	case 4:
 	case 0xB:
 	case 0x1F:
+	case 0x8000001D:
 		*index = 0;
 		return true;
 	case 0xD:
@@ -1310,6 +1311,18 @@ void kvm_initialize_cpu_caps(void)
 		F(AVX10_VNNI_INT, F_CPUID_DEFAULT),
 	);
 
+	kvm_cpu_cap_ignore(0x80000000, 0, 0, BIT(CPUID_EAX),
+			   F_CPUID_DEFAULT | F_CPUID_TDX);
+	kvm_cpu_cap_ignore(0x80000000, 0, 0, BIT(CPUID_EBX) | BIT(CPUID_ECX) | BIT(CPUID_EDX),
+			   F_CPUID_SVM);
+
+	/*
+	 * Although EAX is reserved for Intel platforms, userspace may set it,
+	 * to avoid breaking userspace, ignore it for VMX/TDX as well.
+	 */
+	kvm_cpu_cap_ignore(0x80000001, 0, 0, BIT(CPUID_EAX), F_CPUID_DEFAULT | F_CPUID_TDX);
+	kvm_cpu_cap_init_mf(CPUID_8000_0001_EBX, ~GENMASK_U32(27, 16), F_CPUID_SVM);
+
 	kvm_cpu_cap_init(CPUID_8000_0001_ECX,
 		F(LAHF_LM, F_CPUID_DEFAULT | F_CPUID_TDX),
 		F(CMP_LEGACY, F_CPUID_DEFAULT),
@@ -1367,10 +1380,31 @@ void kvm_initialize_cpu_caps(void)
 	if (!tdp_enabled && IS_ENABLED(CONFIG_X86_64))
 		kvm_cpu_cap_set(X86_FEATURE_GBPAGES, F_CPUID_DEFAULT);
 
+	kvm_cpu_cap_ignore(0x80000002, 0, 0,
+			   BIT(CPUID_EAX) | BIT(CPUID_EBX) | BIT(CPUID_ECX) | BIT(CPUID_EDX),
+			   F_CPUID_DEFAULT | F_CPUID_TDX);
+	kvm_cpu_cap_ignore(0x80000003, 0, 0,
+			   BIT(CPUID_EAX) | BIT(CPUID_EBX) | BIT(CPUID_ECX) | BIT(CPUID_EDX),
+			   F_CPUID_DEFAULT | F_CPUID_TDX);
+	kvm_cpu_cap_ignore(0x80000004, 0, 0,
+			   BIT(CPUID_EAX) | BIT(CPUID_EBX) | BIT(CPUID_ECX) | BIT(CPUID_EDX),
+			   F_CPUID_DEFAULT | F_CPUID_TDX);
+
+	kvm_cpu_cap_ignore(0x80000005, 0, 0,
+			   BIT(CPUID_EAX) | BIT(CPUID_EBX) | BIT(CPUID_ECX) | BIT(CPUID_EDX),
+			   F_CPUID_SVM);
+
+	kvm_cpu_cap_ignore(0x80000006, 0, 0, BIT(CPUID_EAX) | BIT(CPUID_EBX), F_CPUID_SVM);
+	kvm_cpu_cap_ignore(0x80000006, 0, 0, BIT(CPUID_ECX), F_CPUID_DEFAULT | F_CPUID_TDX);
+	kvm_cpu_cap_init_mf(CPUID_8000_0006_EDX, ~GENMASK_U32(17, 16), F_CPUID_SVM);
+
 	kvm_cpu_cap_init(CPUID_8000_0007_EDX,
 		SCATTERED_F(CONSTANT_TSC, F_CPUID_DEFAULT | F_CPUID_TDX),
 	);
 
+	kvm_cpu_cap_init_mf(CPUID_8000_0008_EAX, GENMASK_U32(23, 0),
+			    F_CPUID_DEFAULT | F_CPUID_TDX);
+
 	kvm_cpu_cap_init(CPUID_8000_0008_EBX,
 		F(CLZERO, F_CPUID_DEFAULT),
 		F(XSAVEERPTR, F_CPUID_DEFAULT),
@@ -1388,6 +1422,10 @@ void kvm_initialize_cpu_caps(void)
 		F(AMD_IBPB_RET, F_CPUID_DEFAULT),
 	);
 
+	kvm_cpu_cap_init_mf(CPUID_8000_0008_ECX, GENMASK_U32(17, 12) | GENMASK_U32(7, 0),
+			    F_CPUID_SVM);
+	kvm_cpu_cap_ignore(0x80000008, 0, 0, BIT(CPUID_EDX), F_CPUID_SVM);
+
 	/*
 	 * AMD has separate bits for each SPEC_CTRL bit.
 	 * arch/x86/kernel/cpu/bugs.c is kind enough to
@@ -1415,6 +1453,11 @@ void kvm_initialize_cpu_caps(void)
 	    !boot_cpu_has(X86_FEATURE_AMD_SSBD))
 		kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD, F_CPUID_SVM);
 
+	if (kvm_cpu_cap_has(NULL, X86_FEATURE_SVM)) {
+		kvm_cpu_cap_init_mf(CPUID_8000_000A_EAX, GENMASK_U32(7, 0), F_CPUID_SVM);
+		kvm_cpu_cap_ignore(0x8000000A, 0, 0, BIT(CPUID_EBX), F_CPUID_SVM);
+	}
+
 	/* All SVM features required additional vendor module enabling. */
 	kvm_cpu_cap_init(CPUID_8000_000A_EDX,
 		VENDOR_F(NPT),
@@ -1431,6 +1474,21 @@ void kvm_initialize_cpu_caps(void)
 		VENDOR_F(SVME_ADDR_CHK),
 	);
 
+	kvm_cpu_cap_ignore(0x80000019, 0, 0, BIT(CPUID_EAX) | BIT(CPUID_EBX), F_CPUID_SVM);
+
+	kvm_cpu_cap_check_and_init_mf(CPUID_8000_001A_EAX, GENMASK_U32(2, 0), F_CPUID_SVM);
+
+	kvm_cpu_cap_init_mf(CPUID_8000_001D_EAX, GENMASK_U32(25, 14) | GENMASK_U32(9, 0),
+			    F_CPUID_SVM);
+	kvm_cpu_cap_ignore(0x8000001D, 0, -1, BIT(CPUID_EBX) | BIT(CPUID_ECX), F_CPUID_SVM);
+	kvm_cpu_cap_init_mf(CPUID_8000_001D_EDX, GENMASK_U32(1, 0), F_CPUID_SVM);
+
+	if (kvm_cpu_cap_has(NULL, X86_FEATURE_TOPOEXT)) {
+		kvm_cpu_cap_ignore(0x8000001E, 0, 0, BIT(CPUID_EAX), F_CPUID_SVM);
+		kvm_cpu_cap_init_mf(CPUID_8000_001E_EBX, GENMASK_U32(15, 0), F_CPUID_SVM);
+		kvm_cpu_cap_init_mf(CPUID_8000_001E_ECX, GENMASK_U32(10, 0), F_CPUID_SVM);
+	}
+
 	kvm_cpu_cap_init(CPUID_8000_001F_EAX,
 		VENDOR_F(SME),
 		VENDOR_F(SEV),
@@ -1439,6 +1497,9 @@ void kvm_initialize_cpu_caps(void)
 		F(SME_COHERENT, F_CPUID_DEFAULT),
 	);
 
+	/* KVM does not support VMPL */
+	kvm_cpu_cap_init_mf(CPUID_8000_001F_EBX, GENMASK_U32(11, 0), F_CPUID_SVM);
+
 	kvm_cpu_cap_init(CPUID_8000_0021_EAX,
 		F(NO_NESTED_DATA_BP, F_CPUID_DEFAULT),
 		F(WRMSR_XX_BASE_NS, F_CPUID_DEFAULT),
@@ -1472,6 +1533,9 @@ void kvm_initialize_cpu_caps(void)
 		F(SRSO_USER_KERNEL_NO, F_CPUID_DEFAULT),
 	);
 
+	if (kvm_cpu_cap_has(NULL, X86_FEATURE_ERAPS))
+		kvm_cpu_cap_init_mf(CPUID_8000_0021_EBX, GENMASK_U32(23, 16), F_CPUID_SVM);
+
 	kvm_cpu_cap_init(CPUID_8000_0021_ECX,
 		SYNTHESIZED_F(TSA_SQ_NO, F_CPUID_DEFAULT),
 		SYNTHESIZED_F(TSA_L1_NO, F_CPUID_DEFAULT),
@@ -1481,6 +1545,10 @@ void kvm_initialize_cpu_caps(void)
 		F(PERFMON_V2, F_CPUID_DEFAULT),
 	);
 
+	/* Only expose number of core performance counters. */
+	if (enable_pmu && kvm_cpu_cap_has(NULL, X86_FEATURE_PERFMON_V2))
+		kvm_cpu_cap_init_mf(CPUID_8000_0022_EBX, GENMASK_U32(3, 0), F_CPUID_SVM);
+
 	if (!static_cpu_has_bug(X86_BUG_NULL_SEG))
 		kvm_cpu_cap_set(X86_FEATURE_NULL_SEL_CLR_BASE, F_CPUID_SVM);
 
diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
index 5c7c0fbb0fec..1bdb05aaa852 100644
--- a/arch/x86/kvm/reverse_cpuid.h
+++ b/arch/x86/kvm/reverse_cpuid.h
@@ -155,6 +155,19 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_1F_0_EBX]      = {      0x1f, 0, CPUID_EBX},
 	[CPUID_1F_0_ECX]      = {      0x1f, 0, CPUID_ECX},
 	[CPUID_24_0_EAX]      = {      0x24, 0, CPUID_EAX},
+	[CPUID_8000_0001_EBX] = {0x80000001, 0, CPUID_EBX},
+	[CPUID_8000_0006_EDX] = {0x80000006, 0, CPUID_EDX},
+	[CPUID_8000_0008_EAX] = {0x80000008, 0, CPUID_EAX},
+	[CPUID_8000_0008_ECX] = {0x80000008, 0, CPUID_ECX},
+	[CPUID_8000_000A_EAX] = {0x8000000a, 0, CPUID_EAX},
+	[CPUID_8000_001A_EAX] = {0x8000001a, 0, CPUID_EAX},
+	[CPUID_8000_001D_EAX] = {0x8000001d, 0, CPUID_EAX},
+	[CPUID_8000_001D_EDX] = {0x8000001d, 0, CPUID_EDX},
+	[CPUID_8000_001E_EBX] = {0x8000001e, 0, CPUID_EBX},
+	[CPUID_8000_001E_ECX] = {0x8000001e, 0, CPUID_ECX},
+	[CPUID_8000_001F_EBX] = {0x8000001f, 0, CPUID_EBX},
+	[CPUID_8000_0021_EBX] = {0x80000021, 0, CPUID_EBX},
+	[CPUID_8000_0022_EBX] = {0x80000022, 0, CPUID_EBX},
 };
 
 /*
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 18/27] KVM: x86: Handle Centaur CPUID leafs in paranoid mode
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (16 preceding siblings ...)
  2026-04-17  7:36 ` [RFC PATCH 17/27] KVM: x86: Init allowed masks for extended " Binbin Wu
@ 2026-04-17  7:36 ` Binbin Wu
  2026-04-17  7:36 ` [RFC PATCH 19/27] KVM: x86: Track KVM PV CPUID features for " Binbin Wu
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:36 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Register Centaur CPUID leaves 0xC0000000 and 0xC0000001.EAX to be
skipped during paranoid CPUID verification for the VMX overlay.

Ignore all registers of leaf 0xC0000000, which reports the max supported
CPUID leaf in EAX and vendor-string-like data in EBX/ECX/EDX.

For 0xC0000001, ignore EAX that holding the CPUID version. EDX is already
tracked CPUID_C000_0001_EDX. EBX and ECX are reserved.

0xC0000002 ~ 0xC0000004 are reserved for future use.

Use F_CPUID_VMX for the overlay mask when adding to ignored set since
Centaur/VIA processors are VMX-based, and these leaves are not applicable
to SVM or TDX guests.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 471733eb68d8..c75e7859cc2c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1552,6 +1552,11 @@ void kvm_initialize_cpu_caps(void)
 	if (!static_cpu_has_bug(X86_BUG_NULL_SEG))
 		kvm_cpu_cap_set(X86_FEATURE_NULL_SEL_CLR_BASE, F_CPUID_SVM);
 
+	kvm_cpu_cap_ignore(0xC0000000, 0, 0,
+			   BIT(CPUID_EAX) | BIT(CPUID_EBX) | BIT(CPUID_ECX) | BIT(CPUID_EDX),
+			   F_CPUID_VMX);
+
+	kvm_cpu_cap_ignore(0xC0000001, 0, 0, BIT(CPUID_EAX), F_CPUID_VMX);
 	kvm_cpu_cap_init(CPUID_C000_0001_EDX,
 		F(XSTORE, F_CPUID_DEFAULT),
 		F(XSTORE_EN, F_CPUID_DEFAULT),
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 19/27] KVM: x86: Track KVM PV CPUID features for paranoid mode
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (17 preceding siblings ...)
  2026-04-17  7:36 ` [RFC PATCH 18/27] KVM: x86: Handle Centaur CPUID leafs " Binbin Wu
@ 2026-04-17  7:36 ` Binbin Wu
  2026-04-17  7:36 ` [RFC PATCH 20/27] KVM: x86: Add per-VM flag to track CPUID " Binbin Wu
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:36 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Track KVM's PV CPUID features to the kvm_cpu_caps so that they can be
covered by paranoid CPUID verification.

Define KVM_PV_FEATURE_* constants in reverse_cpuid.h, mirroring the UAPI
KVM_FEATURE_* bit positions, and add a new PV_F() macro to initialize PV
features within kvm_cpu_cap_init() for the new added CPUID_4000_0001_EAX.

PV_F() marks it as emulated, since PV features are entirely software
defined. Also, teach raw_cpuid_get() to return 0 for the
KVM_CPUID_SIGNATURE base without WARNing, as PV features have no hardware
backing and querying raw CPUID for them is expected to be a no-op.

Note KVM PV CPUID base could be relocated to resolve the conflict with
other virtualization enhancements (e.g., HyperV is also enabled by
userspace), the KVM PV CPUID verification in CPUID paranoid mode will be
skipped in this case in a future patch.

Ignore EAX and EBX of leaf 0x40000010 (tsc_khz and apic_bus_freq), which
are allowed multi-bit fields.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/cpuid.c            | 41 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/cpuid.h            |  3 ++-
 arch/x86/kvm/reverse_cpuid.h    | 22 ++++++++++++++++++
 4 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2ec4d92e3e79..f6d79e8496c3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -850,6 +850,7 @@ enum kvm_only_cpuid_leafs {
 	CPUID_8000_001F_EBX,
 	CPUID_8000_0021_EBX,
 	CPUID_8000_0022_EBX,
+	CPUID_4000_0001_EAX,
 	NR_KVM_CPU_CAPS_PARANOID,
 
 	NKVMCAPINTS = NR_KVM_CPU_CAPS_PARANOID - NCAPINTS,
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c75e7859cc2c..789ec9eb7aaf 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -887,6 +887,18 @@ do {									\
 	KVM_VALIDATE_CPU_CAP_USAGE(name);			\
 })
 
+#define PV_F(name, overlay_mask)							\
+({											\
+	BUILD_BUG_ON(__feature_leaf(KVM_PV_FEATURE_##name) != CPUID_4000_0001_EAX);	\
+	BUILD_BUG_ON(kvm_cpu_cap_init_in_progress != CPUID_4000_0001_EAX);		\
+											\
+	kvm_cpu_cap_emulated |= pv_feature_bit(name);					\
+	for (int i = 0; i < NR_CPUID_OL; i++) {						\
+		if ((overlay_mask) & BIT(i))						\
+			kvm_cpu_caps[i][CPUID_4000_0001_EAX] |= pv_feature_bit(name);	\
+	}										\
+})
+
 /*
  * Undefine the MSR bit macro to avoid token concatenation issues when
  * processing X86_FEATURE_SPEC_CTRL_SSBD.
@@ -1586,6 +1598,35 @@ void kvm_initialize_cpu_caps(void)
 		kvm_cpu_cap_clear(X86_FEATURE_RDTSCP, F_CPUID_DEFAULT);
 		kvm_cpu_cap_clear(X86_FEATURE_RDPID, F_CPUID_DEFAULT);
 	}
+
+	kvm_cpu_cap_ignore(KVM_CPUID_SIGNATURE, 0, 0,
+			   BIT(CPUID_EAX) | BIT(CPUID_EBX) | BIT(CPUID_ECX) | BIT(CPUID_EDX),
+			   F_CPUID_DEFAULT | F_CPUID_TDX);
+
+	kvm_cpu_cap_init(CPUID_4000_0001_EAX,
+		PV_F(CLOCKSOURCE, F_CPUID_DEFAULT),
+		PV_F(NOP_IO_DELAY, F_CPUID_DEFAULT | F_CPUID_TDX),
+		PV_F(CLOCKSOURCE2, F_CPUID_DEFAULT),
+		PV_F(ASYNC_PF, F_CPUID_DEFAULT),
+		PV_F(PV_EOI, F_CPUID_DEFAULT),
+		PV_F(PV_UNHALT, F_CPUID_DEFAULT | F_CPUID_TDX),
+		PV_F(PV_TLB_FLUSH, F_CPUID_DEFAULT | F_CPUID_TDX),
+		PV_F(ASYNC_PF_VMEXIT, F_CPUID_DEFAULT),
+		PV_F(PV_SEND_IPI, F_CPUID_DEFAULT | F_CPUID_TDX),
+		PV_F(POLL_CONTROL, F_CPUID_DEFAULT | F_CPUID_TDX),
+		PV_F(PV_SCHED_YIELD, F_CPUID_DEFAULT | F_CPUID_TDX),
+		PV_F(ASYNC_PF_INT, F_CPUID_DEFAULT),
+		PV_F(MSI_EXT_DEST_ID, F_CPUID_DEFAULT | F_CPUID_TDX),
+		PV_F(HC_MAP_GPA_RANGE, F_CPUID_DEFAULT),
+		PV_F(MIGRATION_CONTROL, F_CPUID_DEFAULT),
+		PV_F(CLOCKSOURCE_STABLE_BIT, F_CPUID_DEFAULT),
+	);
+
+	if (sched_info_on())
+		kvm_cpu_cap_set(KVM_PV_FEATURE_STEAL_TIME, F_CPUID_DEFAULT);
+
+	kvm_cpu_cap_ignore(KVM_CPUID_SIGNATURE | 0x10, 0, 0,
+			   BIT(CPUID_EAX) | BIT(CPUID_EBX), F_CPUID_DEFAULT | F_CPUID_TDX);
 }
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_initialize_cpu_caps);
 
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 0b90344a8b98..535377e519b5 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -270,7 +270,8 @@ static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid)
 	 * defined, as this and other code would need to be updated.
 	 */
 	base = cpuid.function & 0xffff0000;
-	if (WARN_ON_ONCE(base && base != 0x80000000 && base != 0xc0000000))
+	if (base == KVM_CPUID_SIGNATURE ||
+	    WARN_ON_ONCE(base && base != 0x80000000 && base != 0xc0000000))
 		return 0;
 
 	if (cpuid_eax(base) < cpuid.function)
diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
index 1bdb05aaa852..03a88ab3585d 100644
--- a/arch/x86/kvm/reverse_cpuid.h
+++ b/arch/x86/kvm/reverse_cpuid.h
@@ -79,6 +79,26 @@
 /* CPUID level 0x6 (ECX) */
 #define KVM_X86_FEATURE_APERFMPERF	KVM_X86_FEATURE(CPUID_6_ECX, 0)
 
+/* CPUID level 0x40000001 (EAX) */
+#define KVM_PV_FEATURE_CLOCKSOURCE		KVM_X86_FEATURE(CPUID_4000_0001_EAX, 0)
+#define KVM_PV_FEATURE_NOP_IO_DELAY		KVM_X86_FEATURE(CPUID_4000_0001_EAX, 1)
+#define KVM_PV_FEATURE_MMU_OP			KVM_X86_FEATURE(CPUID_4000_0001_EAX, 2)
+#define KVM_PV_FEATURE_CLOCKSOURCE2		KVM_X86_FEATURE(CPUID_4000_0001_EAX, 3)
+#define KVM_PV_FEATURE_ASYNC_PF			KVM_X86_FEATURE(CPUID_4000_0001_EAX, 4)
+#define KVM_PV_FEATURE_STEAL_TIME		KVM_X86_FEATURE(CPUID_4000_0001_EAX, 5)
+#define KVM_PV_FEATURE_PV_EOI			KVM_X86_FEATURE(CPUID_4000_0001_EAX, 6)
+#define KVM_PV_FEATURE_PV_UNHALT		KVM_X86_FEATURE(CPUID_4000_0001_EAX, 7)
+#define KVM_PV_FEATURE_PV_TLB_FLUSH		KVM_X86_FEATURE(CPUID_4000_0001_EAX, 9)
+#define KVM_PV_FEATURE_ASYNC_PF_VMEXIT		KVM_X86_FEATURE(CPUID_4000_0001_EAX, 10)
+#define KVM_PV_FEATURE_PV_SEND_IPI		KVM_X86_FEATURE(CPUID_4000_0001_EAX, 11)
+#define KVM_PV_FEATURE_POLL_CONTROL		KVM_X86_FEATURE(CPUID_4000_0001_EAX, 12)
+#define KVM_PV_FEATURE_PV_SCHED_YIELD		KVM_X86_FEATURE(CPUID_4000_0001_EAX, 13)
+#define KVM_PV_FEATURE_ASYNC_PF_INT		KVM_X86_FEATURE(CPUID_4000_0001_EAX, 14)
+#define KVM_PV_FEATURE_MSI_EXT_DEST_ID		KVM_X86_FEATURE(CPUID_4000_0001_EAX, 15)
+#define KVM_PV_FEATURE_HC_MAP_GPA_RANGE		KVM_X86_FEATURE(CPUID_4000_0001_EAX, 16)
+#define KVM_PV_FEATURE_MIGRATION_CONTROL	KVM_X86_FEATURE(CPUID_4000_0001_EAX, 17)
+#define KVM_PV_FEATURE_CLOCKSOURCE_STABLE_BIT	KVM_X86_FEATURE(CPUID_4000_0001_EAX, 24)
+
 struct cpuid_reg {
 	u32 function;
 	u32 index;
@@ -168,6 +188,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_001F_EBX] = {0x8000001f, 0, CPUID_EBX},
 	[CPUID_8000_0021_EBX] = {0x80000021, 0, CPUID_EBX},
 	[CPUID_8000_0022_EBX] = {0x80000022, 0, CPUID_EBX},
+	[CPUID_4000_0001_EAX] = {0x40000001, 0, CPUID_EAX},
 };
 
 /*
@@ -239,6 +260,7 @@ static __always_inline u32 __feature_bit(int x86_feature)
 }
 
 #define feature_bit(name)  __feature_bit(X86_FEATURE_##name)
+#define pv_feature_bit(name)  __feature_bit(KVM_PV_FEATURE_##name)
 
 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned int x86_feature)
 {
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 20/27] KVM: x86: Add per-VM flag to track CPUID paranoid mode
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (18 preceding siblings ...)
  2026-04-17  7:36 ` [RFC PATCH 19/27] KVM: x86: Track KVM PV CPUID features for " Binbin Wu
@ 2026-04-17  7:36 ` Binbin Wu
  2026-04-17  7:36 ` [RFC PATCH 21/27] KVM: x86: Make kvm_vcpu_after_set_cpuid() return an error code Binbin Wu
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:36 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Add 'is_cpuid_paranoid_mode' to struct kvm_arch to indicate whether
CPUID paranoid verification is enabled for a VM.

When enabled, KVM will restrict the guest's CPUID configuration to only
known and supported bits, rejecting userspace-provided CPUID entries that
set unsupported or unrecognized features.  CPUID paranoid mode will be
unconditionally enforced for TDX guests, and optionally enabled for other
VM types via a new KVM capability that will be introduced in a subsequent
patch.

No functional change intended.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f6d79e8496c3..4f645f9dfb5a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1485,6 +1485,9 @@ struct kvm_arch {
 	bool has_protected_state;
 	bool has_protected_eoi;
 	bool pre_fault_allowed;
+
+	bool is_cpuid_paranoid_mode;
+
 	struct hlist_head *mmu_page_hash;
 	struct list_head active_mmu_pages;
 	struct kvm_possible_nx_huge_pages possible_nx_huge_pages[KVM_NR_MMU_TYPES];
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 21/27] KVM: x86: Make kvm_vcpu_after_set_cpuid() return an error code
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (19 preceding siblings ...)
  2026-04-17  7:36 ` [RFC PATCH 20/27] KVM: x86: Add per-VM flag to track CPUID " Binbin Wu
@ 2026-04-17  7:36 ` Binbin Wu
  2026-04-17  7:36 ` [RFC PATCH 22/27] KVM: x86: Verify userspace CPUID inputs in paranoid mode Binbin Wu
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:36 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Change kvm_vcpu_after_set_cpuid() to return an error code, in
preparation for adding CPUID paranoid verification that will reject
invalid userspace CPUID configurations.

Have kvm_set_cpuid() check the return value and unwind on failure,
utilizing the existing error path that restores the vCPU's previous
CPUID entries and capabilities.

No functional change intended.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 8 ++++++--
 arch/x86/kvm/cpuid.h | 2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 789ec9eb7aaf..08f5bc1d26b1 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -449,7 +449,7 @@ static bool __maybe_unused is_cpuid_subleaf_common_pattern(u32 func, u32 *index)
 	}
 }
 
-void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
+int kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	u8 cpuid_overlay = get_cpuid_overlay(vcpu->kvm);
 	struct kvm_lapic *apic = vcpu->arch.apic;
@@ -543,6 +543,8 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	kvm_mmu_after_set_cpuid(vcpu);
 
 	kvm_make_request(KVM_REQ_RECALC_INTERCEPTS, vcpu);
+
+	return 0;
 }
 
 int cpuid_query_maxphyaddr(struct kvm_vcpu *vcpu)
@@ -649,7 +651,9 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 #ifdef CONFIG_KVM_XEN
 	vcpu->arch.xen.cpuid = kvm_get_hypervisor_cpuid(vcpu, XEN_SIGNATURE);
 #endif
-	kvm_vcpu_after_set_cpuid(vcpu);
+	r = kvm_vcpu_after_set_cpuid(vcpu);
+	if (r)
+		goto err;
 
 success:
 	kvfree(e2);
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 535377e519b5..cff5e71579ce 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -42,7 +42,7 @@ static inline void kvm_finalize_cpu_caps(void)
 	kvm_is_configuring_cpu_caps = false;
 }
 
-void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
+int kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry2(struct kvm_cpuid_entry2 *entries,
 					       int nent, u32 function, u64 index);
 /*
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 22/27] KVM: x86: Verify userspace CPUID inputs in paranoid mode
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (20 preceding siblings ...)
  2026-04-17  7:36 ` [RFC PATCH 21/27] KVM: x86: Make kvm_vcpu_after_set_cpuid() return an error code Binbin Wu
@ 2026-04-17  7:36 ` Binbin Wu
  2026-04-17  7:36 ` [RFC PATCH 23/27] KVM: x86: Account for runtime CPUID features " Binbin Wu
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:36 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Add CPUID paranoid verification to reject userspace CPUID configurations
that set unsupported or unknown bits when paranoid mode is enabled for a
VM.

When paranoid mode is enabled, iterate over every userspace-provided
CPUID entry and check all four registers (EAX-EDX) against KVM's
supported masks or values.

Introduce cpuid_reg_2_x86_leaf() to reverse-map a (function, index, reg)
tuple to a reverse_cpuid[] index, handling subleaf remapping for CPUID
leaves with a common pattern across sub-leaves.

Refactor the vCPU capability initialization to iterate over userspace
CPUID entries rather than reverse_cpuid[], combining the paranoid check
with capability setup in cpuid_check_and_set_vcpu_caps().  When paranoid
mode is disabled, entries without a reverse_cpuid[] mapping are simply
skipped, preserving existing behavior.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 142 ++++++++++++++++++++++++++++++++++---------
 1 file changed, 113 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 08f5bc1d26b1..2027230a1f42 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -406,7 +406,7 @@ static void kvm_cpu_cap_ignore(u32 func, u32 index_start, u32 index_end,
 	ignored_set.nr++;
 }
 
-static bool __maybe_unused is_cpuid_paranoid_ignored(u32 func, u32 index, int reg, u8 overlay)
+static bool is_cpuid_paranoid_ignored(u32 func, u32 index, int reg, u8 overlay)
 {
 	for (int i = 0; i < ignored_set.nr; i++) {
 		struct ignored_entry *e = &ignored_set.entries[i];
@@ -419,7 +419,7 @@ static bool __maybe_unused is_cpuid_paranoid_ignored(u32 func, u32 index, int re
 	return false;
 }
 
-static bool __maybe_unused is_cpuid_reg_check_value(u32 func, u32 index, int reg)
+static bool is_cpuid_reg_check_value(u32 func, u32 index, int reg)
 {
 	switch (func) {
 	case 0x1D: return true;
@@ -428,7 +428,7 @@ static bool __maybe_unused is_cpuid_reg_check_value(u32 func, u32 index, int reg
 	}
 }
 
-static bool __maybe_unused is_cpuid_subleaf_common_pattern(u32 func, u32 *index)
+static bool is_cpuid_subleaf_common_pattern(u32 func, u32 *index)
 {
 	switch (func) {
 	case 4:
@@ -449,45 +449,129 @@ static bool __maybe_unused is_cpuid_subleaf_common_pattern(u32 func, u32 *index)
 	}
 }
 
-int kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
+static u32 cpuid_reg_2_x86_leaf(u32 leaf, u32 index, int reg)
 {
-	u8 cpuid_overlay = get_cpuid_overlay(vcpu->kvm);
-	struct kvm_lapic *apic = vcpu->arch.apic;
-	struct kvm_cpuid_entry2 *best;
-	struct kvm_cpuid_entry2 *entry;
-	bool allow_gbpages;
-	int i;
+	bool remapped = false;
 
-	memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps));
-	BUILD_BUG_ON(ARRAY_SIZE(reverse_cpuid) != NR_KVM_CPU_CAPS_PARANOID);
+	if (is_cpuid_subleaf_common_pattern(leaf, &index))
+		remapped = true;
 
-	/*
-	 * Reset guest capabilities to userspace's guest CPUID definition, i.e.
-	 * honor userspace's definition for features that don't require KVM or
-	 * hardware management/support (or that KVM simply doesn't care about).
-	 */
-	for (i = 0; i < NR_KVM_CPU_CAPS; i++) {
-		const struct cpuid_reg cpuid = reverse_cpuid[i];
-		struct kvm_cpuid_entry2 emulated;
+	for (int i = 0; i < ARRAY_SIZE(reverse_cpuid); i++) {
+		const struct cpuid_reg *cpuid = &reverse_cpuid[i];
+
+		if (cpuid->function == leaf && cpuid->index == index && cpuid->reg == reg) {
+			/*
+			 * Remapping index is only inteded for paranoid CPUID
+			 * checks, it should not fall into the range, which
+			 * is tracked by vCPU's caps.
+			 */
+			WARN_ON_ONCE(remapped && i < NR_KVM_CPU_CAPS);
+			return i;
+		}
+	}
 
-		if (!cpuid.function)
+	return (u32)-1;
+}
+
+static int do_cpuid_reg_paranoid_check(struct kvm *kvm,
+				       struct kvm_cpuid_entry2 *entry, int reg,
+				       u32 input, u32 supported)
+{
+	bool check_value;
+
+	if (!kvm->arch.is_cpuid_paranoid_mode)
+		return 0;
+
+	check_value = is_cpuid_reg_check_value(entry->function, entry->index, reg);
+
+	if (check_value && (input == supported))
+		return 0;
+
+	if (!check_value && (input & supported) == input)
+		return 0;
+
+	pr_debug("CPUID func 0x%x index %d E%cX: 0x%08x %s 0x%08x\n",
+		entry->function, entry->index, 'A' + reg, input,
+		check_value ? "!=" : "has unsupported bits",
+		check_value ? supported : input & ~supported);
+
+	return -EINVAL;
+}
+
+static int cpuid_check_and_set_vcpu_caps(struct kvm_vcpu *vcpu,
+					 struct kvm_cpuid_entry2 *entry)
+{
+	u8 cpuid_overlay = get_cpuid_overlay(vcpu->kvm);
+	struct kvm_cpuid_entry2 emulated;
+	u32 input, supported;
+	u32 leaf;
+
+	if (!entry->index)
+		cpuid_func_emulated(vcpu->kvm, &emulated, entry->function, true);
+
+	for (int reg = CPUID_EAX; reg <= CPUID_EDX; reg++) {
+		if (vcpu->kvm->arch.is_cpuid_paranoid_mode &&
+		    is_cpuid_paranoid_ignored(entry->function, entry->index, reg, cpuid_overlay))
 			continue;
 
-		entry = kvm_find_cpuid_entry_index(vcpu, cpuid.function, cpuid.index);
-		if (!entry)
+		/*
+		 * For a leaf remapped from a different index, it will
+		 * not be set to vcpu->arch.cpu_caps[] below.
+		 */
+		leaf = cpuid_reg_2_x86_leaf(entry->function, entry->index, reg);
+
+		if (!vcpu->kvm->arch.is_cpuid_paranoid_mode && leaf >= NR_KVM_CPU_CAPS)
 			continue;
 
+		input = cpuid_get_reg_unsafe(entry, reg);
+
+		supported = leaf != (u32)-1 ? kvm_cpu_caps[cpuid_overlay][leaf] : 0;
+		supported |= (!entry->index ? cpuid_get_reg_unsafe(&emulated, reg) : 0);
+
+		if (do_cpuid_reg_paranoid_check(vcpu->kvm, entry, reg, input, supported))
+			return -EINVAL;
+
+		if (leaf >= NR_KVM_CPU_CAPS)
+			continue;
 		/*
 		 * A vCPU has a feature if it's supported by KVM and is enabled
 		 * in guest CPUID.  Note, this includes features that are
 		 * supported by KVM but aren't advertised to userspace!
 		 */
-		vcpu->arch.cpu_caps[i] = kvm_cpu_caps[cpuid_overlay][i];
-		if (!cpuid.index) {
-			cpuid_func_emulated(vcpu->kvm, &emulated, cpuid.function, true);
-			vcpu->arch.cpu_caps[i] |= cpuid_get_reg_unsafe(&emulated, cpuid.reg);
-		}
-		vcpu->arch.cpu_caps[i] &= cpuid_get_reg_unsafe(entry, cpuid.reg);
+		vcpu->arch.cpu_caps[leaf] = supported;
+		vcpu->arch.cpu_caps[leaf] &= cpuid_get_reg_unsafe(entry, reg);
+	}
+
+	return 0;
+}
+
+int kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+	struct kvm_cpuid_entry2 *best;
+	bool allow_gbpages;
+	int r = 0;
+
+	memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps));
+	BUILD_BUG_ON(ARRAY_SIZE(reverse_cpuid) != NR_KVM_CPU_CAPS_PARANOID);
+
+	/*
+	 * If CPUID paranoid mode is enabled, KVM rejects userspace's guest
+	 * CPUID definition if it contains any bits that aren't supported or
+	 * unknown by KVM.  Otherwise, reset guest capabilities to userspace's
+	 * guest CPUID definition, i.e. honor userspace's definition for
+	 * features that don't require KVM or hardware management/support (or
+	 * that KVM simply doesn't care about).
+	 */
+	for (int i = 0; i < vcpu->arch.cpuid_nent; i++) {
+		r = cpuid_check_and_set_vcpu_caps(vcpu, &vcpu->arch.cpuid_entries[i]);
+		/*
+		 * No need to worry about the changes having been made if any
+		 * check fails, all the changes will be reverted when returning
+		 * an error on the set CPUID patch.
+		 */
+		if (r)
+			return r;
 	}
 
 	kvm_update_cpuid_runtime(vcpu);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 23/27] KVM: x86: Account for runtime CPUID features in paranoid mode
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (21 preceding siblings ...)
  2026-04-17  7:36 ` [RFC PATCH 22/27] KVM: x86: Verify userspace CPUID inputs in paranoid mode Binbin Wu
@ 2026-04-17  7:36 ` Binbin Wu
  2026-04-17  7:36 ` [RFC PATCH 24/27] KVM: x86: Skip paranoid CPUID check for KVM PV leafs when base is relocated Binbin Wu
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:36 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Include RUNTIME_F() features in the supported mask during paranoid CPUID
verification to avoid false rejections of legitimate userspace CPUID
configurations.

Add get_cpuid_reg_dynamic() to return OSXSAVE and OSPKE as supported
bits based on the vCPU's current CR4 state.  Both features are declared
with RUNTIME_F() and thus absent from kvm_cpu_caps[][], but userspace
may legitimately set them when CR4.OSXSAVE or CR4.PKE is enabled.  TDX
guests are unaffected as these bits are not configurable for TDs.

For non-TDX guests, MWAIT is already permitted via cpuid_func_emulated(),
but that function early-returns zero for TDX guests since KVM cannot
(even partially) emulate these features for TDs.  The TDX module does
support exposing MWAIT to guests when the host has MWAIT and userspace
configures it, Declare MWAIT with F(MWAIT, F_CPUID_TDX) alongside the
existing RUNTIME_F(MWAIT) to populate MWAIT in the TDX overlay.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 2027230a1f42..af87b803572c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -498,6 +498,24 @@ static int do_cpuid_reg_paranoid_check(struct kvm *kvm,
 	return -EINVAL;
 }
 
+static u32 get_cpuid_reg_dynamic(struct kvm_vcpu *vcpu, u32 func, u32 index, int reg)
+{
+	switch (func) {
+	case 1:
+		if (reg == CPUID_ECX && kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE))
+			return feature_bit(OSXSAVE);
+		break;
+	case 7:
+		if (index == 0 && reg == CPUID_ECX && kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE))
+			return feature_bit(OSPKE);
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+}
+
 static int cpuid_check_and_set_vcpu_caps(struct kvm_vcpu *vcpu,
 					 struct kvm_cpuid_entry2 *entry)
 {
@@ -527,6 +545,7 @@ static int cpuid_check_and_set_vcpu_caps(struct kvm_vcpu *vcpu,
 
 		supported = leaf != (u32)-1 ? kvm_cpu_caps[cpuid_overlay][leaf] : 0;
 		supported |= (!entry->index ? cpuid_get_reg_unsafe(&emulated, reg) : 0);
+		supported |= get_cpuid_reg_dynamic(vcpu, entry->function, entry->index, reg);
 
 		if (do_cpuid_reg_paranoid_check(vcpu->kvm, entry, reg, input, supported))
 			return -EINVAL;
@@ -1025,6 +1044,11 @@ void kvm_initialize_cpu_caps(void)
 		 * that KVM is aware that it's a known, unadvertised flag.
 		 */
 		RUNTIME_F(MWAIT),
+		/*
+		 * For TDX, MWAIT could be advertised to guests if the host
+		 * supports it and userspace configures it.
+		 */
+		F(MWAIT, F_CPUID_TDX),
 		/* DSCPL is fixed-1 in TDX */
 		F(DSCPL, F_CPUID_TDX),
 		VENDOR_F(VMX),
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 24/27] KVM: x86: Skip paranoid CPUID check for KVM PV leafs when base is relocated
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (22 preceding siblings ...)
  2026-04-17  7:36 ` [RFC PATCH 23/27] KVM: x86: Account for runtime CPUID features " Binbin Wu
@ 2026-04-17  7:36 ` Binbin Wu
  2026-04-17  7:36 ` [RFC PATCH 25/27] KVM: x86: Add new KVM_CAP_X86_CPUID_PARANOID Binbin Wu
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:36 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Handle the case where the KVM PV CPUID base is relocated from its
default location (0x40000000) in paranoid mode, e.g. when userspace
advertises support for both Hyper-V and KVM.

In CPUID paranoid mode, if KVM CPUID base is relocated, skip the normal
per-register paranoid verification. Just rejects the configuration
outright for TDs (as TDX does not support other PV enhancements) and
allows it for normal guests.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index af87b803572c..e6f0ecadc290 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -516,6 +516,16 @@ static u32 get_cpuid_reg_dynamic(struct kvm_vcpu *vcpu, u32 func, u32 index, int
 	return 0;
 }
 
+static int kvm_check_pv_cpuid_relocated(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entry)
+{
+	/* TDX doesn't support other virtualization enhancements. */
+	if (get_cpuid_overlay(vcpu->kvm) == CPUID_OL_TDX)
+		return -EINVAL;
+
+	/* TODO: Check PV CPUIDs when KVM PV CPUID base is relocated. */
+	return 0;
+}
+
 static int cpuid_check_and_set_vcpu_caps(struct kvm_vcpu *vcpu,
 					 struct kvm_cpuid_entry2 *entry)
 {
@@ -524,6 +534,11 @@ static int cpuid_check_and_set_vcpu_caps(struct kvm_vcpu *vcpu,
 	u32 input, supported;
 	u32 leaf;
 
+	if (vcpu->kvm->arch.is_cpuid_paranoid_mode &&
+	    (entry->function & 0xFFFF0000) == KVM_CPUID_SIGNATURE &&
+	    kvm_get_hypervisor_cpuid(vcpu, KVM_SIGNATURE).base != KVM_CPUID_SIGNATURE)
+		return kvm_check_pv_cpuid_relocated(vcpu, entry);
+
 	if (!entry->index)
 		cpuid_func_emulated(vcpu->kvm, &emulated, entry->function, true);
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 25/27] KVM: x86: Add new KVM_CAP_X86_CPUID_PARANOID
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (23 preceding siblings ...)
  2026-04-17  7:36 ` [RFC PATCH 24/27] KVM: x86: Skip paranoid CPUID check for KVM PV leafs when base is relocated Binbin Wu
@ 2026-04-17  7:36 ` Binbin Wu
  2026-04-17  7:36 ` [RFC PATCH 26/27] KVM: x86: Add a helper to query the allowed CPUID mask Binbin Wu
  2026-04-17  7:36 ` [RFC PATCH 27/27] KVM: TDX: Replace hardcoded CPUID filtering with the allowed mask Binbin Wu
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:36 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Introduce a new VM-scoped capability, KVM_CAP_X86_CPUID_PARANOID, to
allow userspace to opt-in CPUID paranoid mode.

When  CPUID paranoid mode is enabled, KVM rejects KVM_SET_CPUID2 if any
CPUID bits unknown or unsupported by KVM are set.

Userspace should enable KVM_CAP_X86_CPUID_PARANOID before creating any
vCPUs.

Unconditionally enforce CPUID paranoid mode for TDs.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 Documentation/virt/kvm/api.rst | 18 ++++++++++++++++++
 arch/x86/kvm/vmx/tdx.c         |  8 ++++++++
 arch/x86/kvm/x86.c             | 13 +++++++++++++
 include/uapi/linux/kvm.h       |  1 +
 4 files changed, 40 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 52bbbb553ce1..81cb78ee9368 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8904,6 +8904,24 @@ helpful if user space wants to emulate instructions which are not
 This capability can be enabled dynamically even if VCPUs were already
 created and are running.
 
+7.47 KVM_CAP_X86_CPUID_PARANOID
+-------------------------------
+
+:Architectures: x86
+:Type: vm
+:Parameters: arg[0], a bitmask of flags, which is reserved for future use.
+:Returns: 0 on success, -EINVAL if arg[0] is not zero or vCPUs have been created
+          before enabling this capability.
+
+When this capability is supported, userspace can query supported CPUIDs per VM
+via KVM_GET_SUPPORTED_CPUID and KVM_GET_EMULATED_CPUID.
+
+When this capability is enabled, KVM will only allow the CPUID bits that are
+known and supported to be exposed to the guest.  KVM will reject KVM_SET_CPUID2
+if any unknown or unsupported bits are set.
+
+For TDX guests, this capability is enabled by default.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index a1df89d66a84..a996e7f761ed 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -638,6 +638,14 @@ int tdx_vm_init(struct kvm *kvm)
 	kvm->arch.has_private_mem = true;
 	kvm->arch.disabled_quirks |= KVM_X86_QUIRK_IGNORE_GUEST_PAT;
 
+	/*
+	 * KVM enforces CPUID paranoid mode for TDs to prevent userspace from
+	 * setting unknown or unsupported bits in CPUID, which could be host
+	 * state clobbering features requiring KVM to do additional host state
+	 * management.
+	 */
+	kvm->arch.is_cpuid_paranoid_mode = true;
+
 	/*
 	 * Because guest TD is protected, VMM can't parse the instruction in TD.
 	 * Instead, guest uses MMIO hypercall.  For unmodified device driver,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4f713afd909a..ed2df450fd0b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4870,6 +4870,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_MEMORY_FAULT_INFO:
 	case KVM_CAP_X86_GUEST_MODE:
 	case KVM_CAP_ONE_REG:
+	case KVM_CAP_X86_CPUID_PARANOID:
 		r = 1;
 		break;
 	case KVM_CAP_PRE_FAULT_MEMORY:
@@ -7006,6 +7007,18 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		mutex_unlock(&kvm->lock);
 		break;
 	}
+	case KVM_CAP_X86_CPUID_PARANOID:
+		r = -EINVAL;
+		if (cap->args[0])
+			break;
+
+		mutex_lock(&kvm->lock);
+		if (!kvm->created_vcpus) {
+			kvm->arch.is_cpuid_paranoid_mode = true;
+			r = 0;
+		}
+		mutex_unlock(&kvm->lock);
+		break;
 	default:
 		r = -EINVAL;
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6c8afa2047bf..daf429cfc6eb 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -996,6 +996,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_S390_USER_OPEREXEC 246
 #define KVM_CAP_S390_KEYOP 247
 #define KVM_CAP_S390_VSIE_ESAMODE 248
+#define KVM_CAP_X86_CPUID_PARANOID 249
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 26/27] KVM: x86: Add a helper to query the allowed CPUID mask
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (24 preceding siblings ...)
  2026-04-17  7:36 ` [RFC PATCH 25/27] KVM: x86: Add new KVM_CAP_X86_CPUID_PARANOID Binbin Wu
@ 2026-04-17  7:36 ` Binbin Wu
  2026-04-17  7:36 ` [RFC PATCH 27/27] KVM: TDX: Replace hardcoded CPUID filtering with the allowed mask Binbin Wu
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:36 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Add and export kvm_cpuid_get_allowed_mask() to let vendor modules (e.g.,
TDX) look up the allowed bitmask for a specific CPUID leaf/subleaf/
register under a given overlay.

TDX is probably the only user, CPUID features emulated are not
included.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 16 ++++++++++++++++
 arch/x86/kvm/cpuid.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e6f0ecadc290..30fb61c1430d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -526,6 +526,22 @@ static int kvm_check_pv_cpuid_relocated(struct kvm_vcpu *vcpu, struct kvm_cpuid_
 	return 0;
 }
 
+/* CPUID features emulated are not included. */
+u32 kvm_cpuid_get_allowed_mask(u32 func, u32 index, int reg, u32 overlay)
+{
+	u32 leaf;
+
+	if (is_cpuid_paranoid_ignored(func, index, reg, overlay))
+		return (u32)-1;
+
+	leaf = cpuid_reg_2_x86_leaf(func, index, reg);
+	if (leaf == (u32)-1)
+		return 0;
+
+	return kvm_cpu_caps[overlay][leaf];
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_cpuid_get_allowed_mask);
+
 static int cpuid_check_and_set_vcpu_caps(struct kvm_vcpu *vcpu,
 					 struct kvm_cpuid_entry2 *entry)
 {
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index cff5e71579ce..251003d78990 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -85,6 +85,7 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
 			      struct kvm_cpuid_entry2 __user *entries);
 bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
 	       u32 *ecx, u32 *edx, bool exact_only);
+u32 kvm_cpuid_get_allowed_mask(u32 func, u32 index, int reg, u32 overlay);
 
 void __init kvm_init_xstate_sizes(void);
 u32 xstate_required_size(u64 xstate_bv, bool compacted);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 27/27] KVM: TDX: Replace hardcoded CPUID filtering with the allowed mask
  2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
                   ` (25 preceding siblings ...)
  2026-04-17  7:36 ` [RFC PATCH 26/27] KVM: x86: Add a helper to query the allowed CPUID mask Binbin Wu
@ 2026-04-17  7:36 ` Binbin Wu
  26 siblings, 0 replies; 28+ messages in thread
From: Binbin Wu @ 2026-04-17  7:36 UTC (permalink / raw)
  To: kvm
  Cc: pbonzini, seanjc, rick.p.edgecombe, xiaoyao.li, chao.gao,
	kai.huang, binbin.wu

Replace TDX's ad-hoc CPUID filtering of TSX (HLE/RTM) and WAITPKG with
the generic kvm_cpuid_get_allowed_mask() helper, which returns the
allowed bitmask from the TDX CPUID overlay for any leaf/subleaf/register.

This makes the TDX CPUID filtering automatically cover all features
governed by the overlay infrastructure, eliminating the need to add new
per-feature helpers as more features are restricted for TDX.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 42 +++++++++++++-----------------------------
 1 file changed, 13 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index a996e7f761ed..2b980335b667 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -120,42 +120,26 @@ static u32 tdx_set_guest_phys_addr_bits(const u32 eax, int addr_bits)
 	return (eax & ~GENMASK(23, 16)) | (addr_bits & 0xff) << 16;
 }
 
-#define TDX_FEATURE_TSX (__feature_bit(X86_FEATURE_HLE) | __feature_bit(X86_FEATURE_RTM))
-
-static bool has_tsx(const struct kvm_cpuid_entry2 *entry)
-{
-	return entry->function == 7 && entry->index == 0 &&
-	       (entry->ebx & TDX_FEATURE_TSX);
-}
-
-static void clear_tsx(struct kvm_cpuid_entry2 *entry)
-{
-	entry->ebx &= ~TDX_FEATURE_TSX;
-}
-
-static bool has_waitpkg(const struct kvm_cpuid_entry2 *entry)
-{
-	return entry->function == 7 && entry->index == 0 &&
-	       (entry->ecx & __feature_bit(X86_FEATURE_WAITPKG));
-}
-
-static void clear_waitpkg(struct kvm_cpuid_entry2 *entry)
-{
-	entry->ecx &= ~__feature_bit(X86_FEATURE_WAITPKG);
-}
-
 static void tdx_clear_unsupported_cpuid(struct kvm_cpuid_entry2 *entry)
 {
-	if (has_tsx(entry))
-		clear_tsx(entry);
+	u32 *reg = &entry->eax;
 
-	if (has_waitpkg(entry))
-		clear_waitpkg(entry);
+	for (int i = CPUID_EAX; i <= CPUID_EDX; i++)
+		reg[i] &= kvm_cpuid_get_allowed_mask(entry->function, entry->index,
+						     i, CPUID_OL_TDX);
 }
 
 static bool tdx_unsupported_cpuid(const struct kvm_cpuid_entry2 *entry)
 {
-	return has_tsx(entry) || has_waitpkg(entry);
+	const u32 *reg = &entry->eax;
+
+	for (int i = CPUID_EAX; i <= CPUID_EDX; i++) {
+		if (reg[i] & ~kvm_cpuid_get_allowed_mask(entry->function, entry->index,
+							 i, CPUID_OL_TDX))
+			return true;
+	}
+
+	return false;
 }
 
 #define KVM_TDX_CPUID_NO_SUBLEAF	((__u32)-1)
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2026-04-17  7:33 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-17  7:35 [RFC PATCH 00/27] KVM: x86: Add a paranoid mode for CPUID verification Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 01/27] KVM: x86: Fix emulated CPUID features being applied to wrong sub-leaf Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 02/27] KVM: x86: Reorder the features for CPUID 7 Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 03/27] KVM: x86: Add definitions for CPUID overlays Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 04/27] KVM: x86: Extend F() and its variants " Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 05/27] KVM: x86: Extend kvm_cpu_cap_{set/clear}() to configure overlays Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 06/27] KVM: x86: Populate TDX CPUID overlay with supported feature bits Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 07/27] KVM: x86: Support KVM_GET_{SUPPORTED,EMULATED}_CPUID as VM scope ioctls Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 08/27] KVM: x86: Thread @kvm to KVM CPU capability helpers Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 09/27] KVM: x86: Use overlays of KVM CPU capabilities Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 10/27] KVM: x86: Use vendor-specific overlay flags instead of F_CPUID_DEFAULT Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 11/27] KVM: SVM: Drop unnecessary clears of unsupported common x86 features Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 12/27] KVM: x86: Split KVM CPU cap leafs into two parts Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 13/27] KVM: x86: Add a helper to initialize CPUID multi-bit fields Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 14/27] KVM: x86: Add a helper to init multiple feature bits based on raw CPUID Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 15/27] KVM: x86: Add infrastructure to track CPUID entries ignored in paranoid mode Binbin Wu
2026-04-17  7:35 ` [RFC PATCH 16/27] KVM: x86: Init allowed masks for basic CPUID range " Binbin Wu
2026-04-17  7:36 ` [RFC PATCH 17/27] KVM: x86: Init allowed masks for extended " Binbin Wu
2026-04-17  7:36 ` [RFC PATCH 18/27] KVM: x86: Handle Centaur CPUID leafs " Binbin Wu
2026-04-17  7:36 ` [RFC PATCH 19/27] KVM: x86: Track KVM PV CPUID features for " Binbin Wu
2026-04-17  7:36 ` [RFC PATCH 20/27] KVM: x86: Add per-VM flag to track CPUID " Binbin Wu
2026-04-17  7:36 ` [RFC PATCH 21/27] KVM: x86: Make kvm_vcpu_after_set_cpuid() return an error code Binbin Wu
2026-04-17  7:36 ` [RFC PATCH 22/27] KVM: x86: Verify userspace CPUID inputs in paranoid mode Binbin Wu
2026-04-17  7:36 ` [RFC PATCH 23/27] KVM: x86: Account for runtime CPUID features " Binbin Wu
2026-04-17  7:36 ` [RFC PATCH 24/27] KVM: x86: Skip paranoid CPUID check for KVM PV leafs when base is relocated Binbin Wu
2026-04-17  7:36 ` [RFC PATCH 25/27] KVM: x86: Add new KVM_CAP_X86_CPUID_PARANOID Binbin Wu
2026-04-17  7:36 ` [RFC PATCH 26/27] KVM: x86: Add a helper to query the allowed CPUID mask Binbin Wu
2026-04-17  7:36 ` [RFC PATCH 27/27] KVM: TDX: Replace hardcoded CPUID filtering with the allowed mask Binbin Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox