kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching
@ 2024-05-17 17:38 Sean Christopherson
  2024-05-17 17:38 ` [PATCH v2 01/49] KVM: x86: Do all post-set CPUID processing during vCPU creation Sean Christopherson
                   ` (49 more replies)
  0 siblings, 50 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

This is technically v2 of "Replace governed features with guest cpu_caps",
but it obviously snowballed just a bit.  This series wanders all over the
place, and ideally would be 3-4 distinct series, but there are interactions
and dependencies all over the place.

The super short TL;DR: snapshot all X86_FEATURE_* flags that KVM cares
about so that all queries against guest capabilities are "fast", e.g. don't
require manual enabling or judgment calls as to where a feature needs to be
fast.

The guest_cpu_cap_* nomenclature follows the existing kvm_cpu_cap_*
except for a few (maybe just one?) cases where guest cpu_caps need APIs
that kvm_cpu_caps don't.  In theory, the similar names will make this
approach more intuitive.

Maxim's suggestion to incorporate KVM's capabilities into the guest's cpu_caps
grew on me, to the point where I decided to just go for it.  Through macro
shenanigans (see the last DO NOT APPLY patch) and manually verifying that
vcpu->arch.cpu_caps is always a superset of guest CPUID, I was able to gain
sufficient confidence that KVM won't silently change guest behavior.  Many, but
not all, of the new patches are related in some way to that approach.
 
There are *multiple* potentially breaking changes in this series (in for a
penny, in for a pound).  However, I don't expect any fallout for real world
VMMs because the ABI changes either disallow things that couldn't possibly
have worked in the first place, or are following in the footsteps of other
behaviors, e.g. KVM advertises x2APIC, which is 100% dependent on an in-kernel
local APIC.

 * Disallow stuffing CPUID-dependent guest CR4 features before setting guest
   CPUID.
 * Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation
 * Reject disabling of MWAIT/HLT interception when not allowed
 * Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID.
 * Advertise HYPERVISOR in KVM_GET_SUPPORTED_CPUID

Lastly, regarding the PoC DO NOT APPLY patch, I hope to turn that into an actual
patch in the future.  E.g. I think we can shove feature usage information into
a .note or something, and then do post-processing a la objtool during the build.

v2:
 - Collect a few reviews (though I dropped several due to the patches changing
   significantly).
 - Incorporate KVM's support into the vCPU's cpu_caps. [Maxim]
 - A massive pile of new patches.

v1: https://lore.kernel.org/all/20231110235528.1561679-1-seanjc@google.com

Sean Christopherson (49):
  KVM: x86: Do all post-set CPUID processing during vCPU creation
  KVM: x86: Explicitly do runtime CPUID updates "after" initial setup
  KVM: x86: Account for KVM-reserved CR4 bits when passing through CR4
    on VMX
  KVM: selftests: Update x86's set_sregs_test to match KVM's CPUID
    enforcement
  KVM: selftests: Assert that the @cpuid passed to get_cpuid_entry() is
    non-NULL
  KVM: selftests: Refresh vCPU CPUID cache in __vcpu_get_cpuid_entry()
  KVM: selftests: Verify KVM stuffs runtime CPUID OS bits on CR4 writes
  KVM: x86: Move __kvm_is_valid_cr4() definition to x86.h
  KVM: x86/pmu: Drop now-redundant refresh() during init()
  KVM: x86: Drop now-redundant MAXPHYADDR and GPA rsvd bits from vCPU
    creation
  KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation
  KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed
  KVM: selftests: Fix a bad TEST_REQUIRE() in x86's KVM PV test
  KVM: selftests: Update x86's KVM PV test to match KVM's disabling
    exits behavior
  KVM: x86: Zero out PV features cache when the CPUID leaf is not
    present
  KVM: x86: Don't update PV features caches when enabling enforcement
    capability
  KVM: x86: Do reverse CPUID sanity checks in __feature_leaf()
  KVM: x86: Account for max supported CPUID leaf when getting raw host
    CPUID
  KVM: x86: Add a macro to init CPUID features that ignore host kernel
    support
  KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init()
  KVM: x86: Add a macro to init CPUID features that are 64-bit only
  KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID
    features
  KVM: x86: Handle kernel- and KVM-defined CPUID words in a single
    helper
  KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions
  KVM: x86: Harden CPU capabilities processing against out-of-scope
    features
  KVM: x86: Add a macro to init CPUID features that KVM emulates in
    software
  KVM: x86: Swap incoming guest CPUID into vCPU before massaging in
    KVM_SET_CPUID2
  KVM: x86: Clear PV_UNHALT for !HLT-exiting only when userspace sets
    CPUID
  KVM: x86: Remove unnecessary caching of KVM's PV CPUID base
  KVM: x86: Always operate on kvm_vcpu data in cpuid_entry2_find()
  KVM: x86: Move kvm_find_cpuid_entry{,_index}() up near
    cpuid_entry2_find()
  KVM: x86: Remove all direct usage of cpuid_entry2_find()
  KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID
  KVM: x86: Advertise HYPERVISOR in KVM_GET_SUPPORTED_CPUID
  KVM: x86: Add a macro to handle features that are fully VMM controlled
  KVM: x86: Rename "governed features" helpers to use "guest_cpu_cap"
  KVM: x86: Replace guts of "governed" features with comprehensive
    cpu_caps
  KVM: x86: Initialize guest cpu_caps based on guest CPUID
  KVM: x86: Extract code for generating per-entry emulated CPUID
    information
  KVM: x86: Initialize guest cpu_caps based on KVM support
  KVM: x86: Avoid double CPUID lookup when updating MWAIT at runtime
  KVM: x86: Drop unnecessary check that cpuid_entry2_find() returns
    right leaf
  KVM: x86: Update OS{XSAVE,PKE} bits in guest CPUID irrespective of
    host support
  KVM: x86: Update guest cpu_caps at runtime for dynamic CPUID-based
    features
  KVM: x86: Shuffle code to prepare for dropping guest_cpuid_has()
  KVM: x86: Replace (almost) all guest CPUID feature queries with
    cpu_caps
  KVM: x86: Drop superfluous host XSAVE check when adjusting guest
    XSAVES caps
  KVM: x86: Add a macro for features that are synthesized into
    boot_cpu_data
  *** DO NOT APPLY *** KVM: x86: Verify KVM initializes all consumed
    guest caps

 Documentation/virt/kvm/api.rst                |  10 +-
 arch/x86/include/asm/kvm_host.h               |  46 +-
 arch/x86/kvm/cpuid.c                          | 660 +++++++++++-------
 arch/x86/kvm/cpuid.h                          | 141 ++--
 arch/x86/kvm/governed_features.h              |  22 -
 arch/x86/kvm/hyperv.c                         |   2 +-
 arch/x86/kvm/lapic.c                          |   2 +-
 arch/x86/kvm/mmu.h                            |   2 +-
 arch/x86/kvm/mmu/mmu.c                        |   4 +-
 arch/x86/kvm/mtrr.c                           |   2 +-
 arch/x86/kvm/pmu.c                            |   1 -
 arch/x86/kvm/reverse_cpuid.h                  |  22 +-
 arch/x86/kvm/smm.c                            |  10 +-
 arch/x86/kvm/svm/nested.c                     |  22 +-
 arch/x86/kvm/svm/pmu.c                        |   8 +-
 arch/x86/kvm/svm/sev.c                        |  21 +-
 arch/x86/kvm/svm/svm.c                        |  46 +-
 arch/x86/kvm/svm/svm.h                        |   4 +-
 arch/x86/kvm/vmx/hyperv.h                     |   2 +-
 arch/x86/kvm/vmx/nested.c                     |  18 +-
 arch/x86/kvm/vmx/pmu_intel.c                  |   4 +-
 arch/x86/kvm/vmx/sgx.c                        |  14 +-
 arch/x86/kvm/vmx/vmx.c                        |  61 +-
 arch/x86/kvm/x86.c                            | 153 ++--
 arch/x86/kvm/x86.h                            |   6 +-
 include/asm-generic/vmlinux.lds.h             |   4 +
 .../selftests/kvm/include/x86_64/processor.h  |  11 +-
 .../selftests/kvm/lib/x86_64/processor.c      |   2 +
 .../selftests/kvm/x86_64/kvm_pv_test.c        |  38 +-
 .../selftests/kvm/x86_64/set_sregs_test.c     |  63 +-
 30 files changed, 791 insertions(+), 610 deletions(-)
 delete mode 100644 arch/x86/kvm/governed_features.h


base-commit: 4aad0b1893a141f114ba40ed509066f3c9bc24b0
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply	[flat|nested] 185+ messages in thread

* [PATCH v2 01/49] KVM: x86: Do all post-set CPUID processing during vCPU creation
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  0:48   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 02/49] KVM: x86: Explicitly do runtime CPUID updates "after" initial setup Sean Christopherson
                   ` (48 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

During vCPU creation, process KVM's default, empty CPUID as if userspace
set an empty CPUID to ensure consistent and correct behavior with respect
to guest CPUID.  E.g. if userspace never sets guest CPUID, KVM will never
configure cr4_guest_rsvd_bits, and thus create divergent, incorrect, guest-
visible behavior due to letting the guest set any KVM-supported CR4 bits
despite the features not being allowed per guest CPUID.

Note!  This changes KVM's ABI, as lack of full CPUID processing allowed
userspace to stuff garbage vCPU state, e.g. userspace could set CR4 to a
guest-unsupported value via KVM_SET_SREGS.  But it's extremely unlikely
that this is a breaking change, as KVM already has many flows that require
userspace to set guest CPUID before loading vCPU state.  E.g. multiple MSR
flows consult guest CPUID on host writes, and KVM_SET_SREGS itself already
relies on guest CPUID being up-to-date, as KVM's validity check on CR3
consumes CPUID.0x7.1 (for LAM) and CPUID.0x80000008 (for MAXPHYADDR).

Furthermore, the plan is to commit to enforcing guest CPUID for userspace
writes to MSRs, at which point bypassing sregs CPUID checks is even more
nonsensical.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 2 +-
 arch/x86/kvm/cpuid.h | 1 +
 arch/x86/kvm/x86.c   | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f2f2be5d1141..2b19ff991ceb 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -335,7 +335,7 @@ static bool kvm_cpuid_has_hyperv(struct kvm_cpuid_entry2 *entries, int nent)
 #endif
 }
 
-static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
+void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	struct kvm_cpuid_entry2 *best;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 23dbb9eb277c..0a8b561b5434 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -11,6 +11,7 @@
 extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
 void kvm_set_cpu_caps(void);
 
+void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
 void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
 void kvm_update_pv_runtime(struct kvm_vcpu *vcpu);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d750546ec934..7adcf56bd45d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12234,6 +12234,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	kvm_xen_init_vcpu(vcpu);
 	kvm_vcpu_mtrr_init(vcpu);
 	vcpu_load(vcpu);
+	kvm_vcpu_after_set_cpuid(vcpu);
 	kvm_set_tsc_khz(vcpu, vcpu->kvm->arch.default_tsc_khz);
 	kvm_vcpu_reset(vcpu, false);
 	kvm_init_mmu(vcpu);
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 02/49] KVM: x86: Explicitly do runtime CPUID updates "after" initial setup
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
  2024-05-17 17:38 ` [PATCH v2 01/49] KVM: x86: Do all post-set CPUID processing during vCPU creation Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  0:51   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 03/49] KVM: x86: Account for KVM-reserved CR4 bits when passing through CR4 on VMX Sean Christopherson
                   ` (47 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Explicitly perform runtime CPUID adjustments as part of the "after set
CPUID" flow to guard against bugs where KVM consumes stale vCPU/CPUID
state during kvm_update_cpuid_runtime().  E.g. see commit 4736d85f0d18
("KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT").

Whacking each mole individually is not sustainable or robust, e.g. while
the aforemention commit fixed KVM's PV features, the same issue lurks for
Xen and Hyper-V features, Xen and Hyper-V simply don't have any runtime
features (though spoiler alert, neither should KVM).

Updating runtime features in the "full" path will also simplify adding a
snapshot of the guest's capabilities, i.e. of caching the intersection of
guest CPUID and kvm_cpu_caps (modulo a few edge cases).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 2b19ff991ceb..e60ffb421e4b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -345,6 +345,8 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	bitmap_zero(vcpu->arch.governed_features.enabled,
 		    KVM_MAX_NR_GOVERNED_FEATURES);
 
+	kvm_update_cpuid_runtime(vcpu);
+
 	/*
 	 * If TDP is enabled, let the guest use GBPAGES if they're supported in
 	 * hardware.  The hardware page walker doesn't let KVM disable GBPAGES,
@@ -426,8 +428,6 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 {
 	int r;
 
-	__kvm_update_cpuid_runtime(vcpu, e2, nent);
-
 	/*
 	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
 	 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
@@ -440,6 +440,15 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 	 * whether the supplied CPUID data is equal to what's already set.
 	 */
 	if (kvm_vcpu_has_run(vcpu)) {
+		/*
+		 * Note, runtime CPUID updates may consume other CPUID-driven
+		 * vCPU state, e.g. KVM or Xen CPUID bases.  Updating runtime
+		 * state before full CPUID processing is functionally correct
+		 * only because any change in CPUID is disallowed, i.e. using
+		 * stale data is ok because KVM will reject the change.
+		 */
+		__kvm_update_cpuid_runtime(vcpu, e2, nent);
+
 		r = kvm_cpuid_check_equal(vcpu, e2, nent);
 		if (r)
 			return r;
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 03/49] KVM: x86: Account for KVM-reserved CR4 bits when passing through CR4 on VMX
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
  2024-05-17 17:38 ` [PATCH v2 01/49] KVM: x86: Do all post-set CPUID processing during vCPU creation Sean Christopherson
  2024-05-17 17:38 ` [PATCH v2 02/49] KVM: x86: Explicitly do runtime CPUID updates "after" initial setup Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  0:55   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 04/49] KVM: selftests: Update x86's set_sregs_test to match KVM's CPUID enforcement Sean Christopherson
                   ` (46 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Drop x86.c's local pre-computed cr4_reserved bits and instead fold KVM's
reserved bits into the guest's reserved bits.  This fixes a bug where VMX's
set_cr4_guest_host_mask() fails to account for KVM-reserved bits when
deciding which bits can be passed through to the guest.  In most cases,
letting the guest directly write reserved CR4 bits is ok, i.e. attempting
to set the bit(s) will still #GP, but not if a feature is available in
hardware but explicitly disabled by the host, e.g. if FSGSBASE support is
disabled via "nofsgsbase".

Note, the extra overhead of computing host reserved bits every time
userspace sets guest CPUID is negligible.  The feature bits that are
queried are packed nicely into a handful of words, and so checking and
setting each reserved bit costs in the neighborhood of ~5 cycles, i.e. the
total cost will be in the noise even if the number of checked CR4 bits
doubles over the next few years.  In other words, x86 will run out of CR4
bits long before the overhead becomes problematic.

Note #2, __cr4_reserved_bits() starts from CR4_RESERVED_BITS, which is
why the existing __kvm_cpu_cap_has() processing doesn't explicitly OR in
CR4_RESERVED_BITS (and why the new code doesn't do so either).

Fixes: 2ed41aa631fc ("KVM: VMX: Intercept guest reserved CR4 bits to inject #GP fault")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 7 +++++--
 arch/x86/kvm/x86.c   | 9 ---------
 2 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e60ffb421e4b..f756a91a3f2f 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -383,8 +383,11 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
 
 	kvm_pmu_refresh(vcpu);
-	vcpu->arch.cr4_guest_rsvd_bits =
-	    __cr4_reserved_bits(guest_cpuid_has, vcpu);
+
+#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
+	vcpu->arch.cr4_guest_rsvd_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_) |
+					 __cr4_reserved_bits(guest_cpuid_has, vcpu);
+#undef __kvm_cpu_cap_has
 
 	kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu->arch.cpuid_entries,
 						    vcpu->arch.cpuid_nent));
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7adcf56bd45d..3f20de4368a6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -116,8 +116,6 @@ u64 __read_mostly efer_reserved_bits = ~((u64)(EFER_SCE | EFER_LME | EFER_LMA));
 static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
 #endif
 
-static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;
-
 #define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
 
 #define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
@@ -1134,9 +1132,6 @@ EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv);
 
 bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
-	if (cr4 & cr4_reserved_bits)
-		return false;
-
 	if (cr4 & vcpu->arch.cr4_guest_rsvd_bits)
 		return false;
 
@@ -9831,10 +9826,6 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
 		kvm_caps.supported_xss = 0;
 
-#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
-	cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
-#undef __kvm_cpu_cap_has
-
 	if (kvm_caps.has_tsc_control) {
 		/*
 		 * Make sure the user can only configure tsc_khz values that
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 04/49] KVM: selftests: Update x86's set_sregs_test to match KVM's CPUID enforcement
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (2 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 03/49] KVM: x86: Account for KVM-reserved CR4 bits when passing through CR4 on VMX Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  0:55   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 05/49] KVM: selftests: Assert that the @cpuid passed to get_cpuid_entry() is non-NULL Sean Christopherson
                   ` (45 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Rework x86's set sregs test to verify that KVM enforces CPUID vs. CR4
features even if userspace hasn't explicitly set guest CPUID.  KVM used to
allow userspace to set any KVM-supported CR4 value prior to KVM_SET_CPUID2,
and the test verified that behavior.

However, the testcase was written purely to verify KVM's existing behavior,
i.e. was NOT written to match the needs of real world VMMs.

Opportunistically verify that KVM continues to reject unsupported features
after KVM_SET_CPUID2 (using KVM_GET_SUPPORTED_CPUID).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../selftests/kvm/x86_64/set_sregs_test.c     | 53 +++++++++++--------
 1 file changed, 30 insertions(+), 23 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
index c021c0795a96..96fd690d479a 100644
--- a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
@@ -41,13 +41,15 @@ do {										\
 	TEST_ASSERT(!memcmp(&new, &orig, sizeof(new)), "KVM modified sregs");	\
 } while (0)
 
+#define KVM_ALWAYS_ALLOWED_CR4 (X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD |	\
+				X86_CR4_DE | X86_CR4_PSE | X86_CR4_PAE |	\
+				X86_CR4_MCE | X86_CR4_PGE | X86_CR4_PCE |	\
+				X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT)
+
 static uint64_t calc_supported_cr4_feature_bits(void)
 {
-	uint64_t cr4;
+	uint64_t cr4 = KVM_ALWAYS_ALLOWED_CR4;
 
-	cr4 = X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE |
-	      X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE | X86_CR4_PGE |
-	      X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT;
 	if (kvm_cpu_has(X86_FEATURE_UMIP))
 		cr4 |= X86_CR4_UMIP;
 	if (kvm_cpu_has(X86_FEATURE_LA57))
@@ -72,28 +74,14 @@ static uint64_t calc_supported_cr4_feature_bits(void)
 	return cr4;
 }
 
-int main(int argc, char *argv[])
+static void test_cr_bits(struct kvm_vcpu *vcpu, uint64_t cr4)
 {
 	struct kvm_sregs sregs;
-	struct kvm_vcpu *vcpu;
-	struct kvm_vm *vm;
-	uint64_t cr4;
 	int rc, i;
 
-	/*
-	 * Create a dummy VM, specifically to avoid doing KVM_SET_CPUID2, and
-	 * use it to verify all supported CR4 bits can be set prior to defining
-	 * the vCPU model, i.e. without doing KVM_SET_CPUID2.
-	 */
-	vm = vm_create_barebones();
-	vcpu = __vm_vcpu_add(vm, 0);
-
 	vcpu_sregs_get(vcpu, &sregs);
-
-	sregs.cr0 = 0;
-	sregs.cr4 |= calc_supported_cr4_feature_bits();
-	cr4 = sregs.cr4;
-
+	sregs.cr0 &= ~(X86_CR0_CD | X86_CR0_NW);
+	sregs.cr4 |= cr4;
 	rc = _vcpu_sregs_set(vcpu, &sregs);
 	TEST_ASSERT(!rc, "Failed to set supported CR4 bits (0x%lx)", cr4);
 
@@ -101,7 +89,6 @@ int main(int argc, char *argv[])
 	TEST_ASSERT(sregs.cr4 == cr4, "sregs.CR4 (0x%llx) != CR4 (0x%lx)",
 		    sregs.cr4, cr4);
 
-	/* Verify all unsupported features are rejected by KVM. */
 	TEST_INVALID_CR_BIT(vcpu, cr4, sregs, X86_CR4_UMIP);
 	TEST_INVALID_CR_BIT(vcpu, cr4, sregs, X86_CR4_LA57);
 	TEST_INVALID_CR_BIT(vcpu, cr4, sregs, X86_CR4_VMXE);
@@ -119,10 +106,28 @@ int main(int argc, char *argv[])
 	/* NW without CD is illegal, as is PG without PE. */
 	TEST_INVALID_CR_BIT(vcpu, cr0, sregs, X86_CR0_NW);
 	TEST_INVALID_CR_BIT(vcpu, cr0, sregs, X86_CR0_PG);
+}
 
+int main(int argc, char *argv[])
+{
+	struct kvm_sregs sregs;
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+	int rc;
+
+	/*
+	 * Create a dummy VM, specifically to avoid doing KVM_SET_CPUID2, and
+	 * use it to verify KVM enforces guest CPUID even if *userspace* never
+	 * sets CPUID.
+	 */
+	vm = vm_create_barebones();
+	vcpu = __vm_vcpu_add(vm, 0);
+	test_cr_bits(vcpu, KVM_ALWAYS_ALLOWED_CR4);
 	kvm_vm_free(vm);
 
-	/* Create a "real" VM and verify APIC_BASE can be set. */
+	/* Create a "real" VM with a fully populated guest CPUID and verify
+	 * APIC_BASE and all supported CR4 can be set.
+	 */
 	vm = vm_create_with_one_vcpu(&vcpu, NULL);
 
 	vcpu_sregs_get(vcpu, &sregs);
@@ -135,6 +140,8 @@ int main(int argc, char *argv[])
 	TEST_ASSERT(!rc, "Couldn't set IA32_APIC_BASE to %llx (valid)",
 		    sregs.apic_base);
 
+	test_cr_bits(vcpu, calc_supported_cr4_feature_bits());
+
 	kvm_vm_free(vm);
 
 	return 0;
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 05/49] KVM: selftests: Assert that the @cpuid passed to get_cpuid_entry() is non-NULL
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (3 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 04/49] KVM: selftests: Update x86's set_sregs_test to match KVM's CPUID enforcement Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  0:58   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 06/49] KVM: selftests: Refresh vCPU CPUID cache in __vcpu_get_cpuid_entry() Sean Christopherson
                   ` (44 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Add a sanity check in get_cpuid_entry() to provide a friendlier error than
a segfault when a test developer tries to use a vCPU CPUID helper on a
barebones vCPU.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/lib/x86_64/processor.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index c664e446136b..f0f3434d767e 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -1141,6 +1141,8 @@ const struct kvm_cpuid_entry2 *get_cpuid_entry(const struct kvm_cpuid2 *cpuid,
 {
 	int i;
 
+	TEST_ASSERT(cpuid, "Must do vcpu_init_cpuid() first (or equivalent)");
+
 	for (i = 0; i < cpuid->nent; i++) {
 		if (cpuid->entries[i].function == function &&
 		    cpuid->entries[i].index == index)
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 06/49] KVM: selftests: Refresh vCPU CPUID cache in __vcpu_get_cpuid_entry()
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (4 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 05/49] KVM: selftests: Assert that the @cpuid passed to get_cpuid_entry() is non-NULL Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  0:59   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 07/49] KVM: selftests: Verify KVM stuffs runtime CPUID OS bits on CR4 writes Sean Christopherson
                   ` (43 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Refresh selftests' CPUID cache in the vCPU structure when querying a CPUID
entry so that tests don't consume stale data when KVM modifies CPUID as a
side effect to a completely unrelated change.  E.g. KVM adjusts OSXSAVE in
response to CR4.OSXSAVE changes.

Unnecessarily invoking KVM_GET_CPUID is suboptimal, but vcpu->cpuid exists
to simplify selftests development, not for performance reasons.  And,
unfortunately, trying to handle the side effects in tests or other flows
is unpleasant, e.g. selftests could manually refresh if KVM_SET_SREGS is
successful, but that would still leave a gap with respect to guest CR4
changes.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../testing/selftests/kvm/include/x86_64/processor.h  | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 8eb57de0b587..99aa3dfca16c 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -992,10 +992,17 @@ static inline struct kvm_cpuid2 *allocate_kvm_cpuid2(int nr_entries)
 void vcpu_init_cpuid(struct kvm_vcpu *vcpu, const struct kvm_cpuid2 *cpuid);
 void vcpu_set_hv_cpuid(struct kvm_vcpu *vcpu);
 
+static inline void vcpu_get_cpuid(struct kvm_vcpu *vcpu)
+{
+	vcpu_ioctl(vcpu, KVM_GET_CPUID2, vcpu->cpuid);
+}
+
 static inline struct kvm_cpuid_entry2 *__vcpu_get_cpuid_entry(struct kvm_vcpu *vcpu,
 							      uint32_t function,
 							      uint32_t index)
 {
+	vcpu_get_cpuid(vcpu);
+
 	return (struct kvm_cpuid_entry2 *)get_cpuid_entry(vcpu->cpuid,
 							  function, index);
 }
@@ -1016,7 +1023,7 @@ static inline int __vcpu_set_cpuid(struct kvm_vcpu *vcpu)
 		return r;
 
 	/* On success, refresh the cache to pick up adjustments made by KVM. */
-	vcpu_ioctl(vcpu, KVM_GET_CPUID2, vcpu->cpuid);
+	vcpu_get_cpuid(vcpu);
 	return 0;
 }
 
@@ -1026,7 +1033,7 @@ static inline void vcpu_set_cpuid(struct kvm_vcpu *vcpu)
 	vcpu_ioctl(vcpu, KVM_SET_CPUID2, vcpu->cpuid);
 
 	/* Refresh the cache to pick up adjustments made by KVM. */
-	vcpu_ioctl(vcpu, KVM_GET_CPUID2, vcpu->cpuid);
+	vcpu_get_cpuid(vcpu);
 }
 
 void vcpu_set_cpuid_property(struct kvm_vcpu *vcpu,
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 07/49] KVM: selftests: Verify KVM stuffs runtime CPUID OS bits on CR4 writes
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (5 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 06/49] KVM: selftests: Refresh vCPU CPUID cache in __vcpu_get_cpuid_entry() Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  1:02   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 08/49] KVM: x86: Move __kvm_is_valid_cr4() definition to x86.h Sean Christopherson
                   ` (42 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Extend x86's set sregs test to verify that KVM sets/clears OSXSAVE and
OSKPKE according to CR4.XSAVE and CR4.PKE respectively.  For performance
reasons, KVM is responsible for emulating the architectural behavior of
the OS CPUID bits tracking CR4.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/x86_64/set_sregs_test.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
index 96fd690d479a..f4095a3d1278 100644
--- a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
@@ -85,6 +85,16 @@ static void test_cr_bits(struct kvm_vcpu *vcpu, uint64_t cr4)
 	rc = _vcpu_sregs_set(vcpu, &sregs);
 	TEST_ASSERT(!rc, "Failed to set supported CR4 bits (0x%lx)", cr4);
 
+	TEST_ASSERT(!!(sregs.cr4 & X86_CR4_OSXSAVE) ==
+		    (vcpu->cpuid && vcpu_cpuid_has(vcpu, X86_FEATURE_OSXSAVE)),
+		    "KVM didn't %s OSXSAVE in CPUID as expected",
+		    (sregs.cr4 & X86_CR4_OSXSAVE) ? "set" : "clear");
+
+	TEST_ASSERT(!!(sregs.cr4 & X86_CR4_PKE) ==
+		    (vcpu->cpuid && vcpu_cpuid_has(vcpu, X86_FEATURE_OSPKE)),
+		    "KVM didn't %s OSPKE in CPUID as expected",
+		    (sregs.cr4 & X86_CR4_PKE) ? "set" : "clear");
+
 	vcpu_sregs_get(vcpu, &sregs);
 	TEST_ASSERT(sregs.cr4 == cr4, "sregs.CR4 (0x%llx) != CR4 (0x%lx)",
 		    sregs.cr4, cr4);
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 08/49] KVM: x86: Move __kvm_is_valid_cr4() definition to x86.h
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (6 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 07/49] KVM: selftests: Verify KVM stuffs runtime CPUID OS bits on CR4 writes Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  1:02   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 09/49] KVM: x86/pmu: Drop now-redundant refresh() during init() Sean Christopherson
                   ` (41 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Let vendor code inline __kvm_is_valid_cr4() now x86.c's cr4_reserved_bits
no longer exists, as keeping cr4_reserved_bits local to x86.c was the only
reason for "hiding" the definition of __kvm_is_valid_cr4().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 9 ---------
 arch/x86/kvm/x86.h | 6 +++++-
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3f20de4368a6..2f6dda723005 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1130,15 +1130,6 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv);
 
-bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
-{
-	if (cr4 & vcpu->arch.cr4_guest_rsvd_bits)
-		return false;
-
-	return true;
-}
-EXPORT_SYMBOL_GPL(__kvm_is_valid_cr4);
-
 static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
 	return __kvm_is_valid_cr4(vcpu, cr4) &&
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index d80a4c6b5a38..4a723705a139 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -491,7 +491,6 @@ static inline void kvm_machine_check(void)
 void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
 void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
 int kvm_spec_ctrl_test_value(u64 value);
-bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
 int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
 			      struct x86_exception *e);
 int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva);
@@ -505,6 +504,11 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
 #define  KVM_MSR_RET_INVALID	2	/* in-kernel MSR emulation #GP condition */
 #define  KVM_MSR_RET_FILTERED	3	/* #GP due to userspace MSR filter */
 
+static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+	return !(cr4 & vcpu->arch.cr4_guest_rsvd_bits);
+}
+
 #define __cr4_reserved_bits(__cpu_has, __c)             \
 ({                                                      \
 	u64 __reserved_bits = CR4_RESERVED_BITS;        \
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 09/49] KVM: x86/pmu: Drop now-redundant refresh() during init()
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (7 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 08/49] KVM: x86: Move __kvm_is_valid_cr4() definition to x86.h Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  1:02   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 10/49] KVM: x86: Drop now-redundant MAXPHYADDR and GPA rsvd bits from vCPU creation Sean Christopherson
                   ` (40 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Drop the manual kvm_pmu_refresh() from kvm_pmu_init() now that
kvm_arch_vcpu_create() performs the refresh via kvm_vcpu_after_set_cpuid().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/pmu.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index a593b03c9aed..31920dd1aa83 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -797,7 +797,6 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
 
 	memset(pmu, 0, sizeof(*pmu));
 	static_call(kvm_x86_pmu_init)(vcpu);
-	kvm_pmu_refresh(vcpu);
 }
 
 /* Release perf_events for vPMCs that have been unused for a full time slice.  */
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 10/49] KVM: x86: Drop now-redundant MAXPHYADDR and GPA rsvd bits from vCPU creation
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (8 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 09/49] KVM: x86/pmu: Drop now-redundant refresh() during init() Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  1:13   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 11/49] KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after " Sean Christopherson
                   ` (39 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Drop the manual initialization of maxphyaddr and reserved_gpa_bits during
vCPU creation now that kvm_arch_vcpu_create() unconditionally invokes
kvm_vcpu_after_set_cpuid(), which handles all such CPUID caching.

None of the helpers between the existing code in kvm_arch_vcpu_create()
and the call to kvm_vcpu_after_set_cpuid() consume maxphyaddr or
reserved_gpa_bits (though auditing vmx_vcpu_create() and svm_vcpu_create()
isn't exactly easy).  And even if that weren't the case, KVM _must_
refresh any affected state during kvm_vcpu_after_set_cpuid(), e.g. to
correctly handle KVM_SET_CPUID2.  In other words, this can't introduce a
new bug, only expose an existing bug (of which there don't appear to be
any).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2f6dda723005..bb34891d2f0a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12190,9 +12190,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 		goto free_emulate_ctxt;
 	}
 
-	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
-	vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
-
 	vcpu->arch.pat = MSR_IA32_CR_PAT_DEFAULT;
 
 	kvm_async_pf_hash_reset(vcpu);
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 11/49] KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (9 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 10/49] KVM: x86: Drop now-redundant MAXPHYADDR and GPA rsvd bits from vCPU creation Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  1:17   ` Maxim Levitsky
  2024-07-12  7:42   ` Xiaoyao Li
  2024-05-17 17:38 ` [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed Sean Christopherson
                   ` (38 subsequent siblings)
  49 siblings, 2 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Reject KVM_CAP_X86_DISABLE_EXITS if vCPUs have been created, as disabling
PAUSE/MWAIT/HLT exits after vCPUs have been created is broken and useless,
e.g. except for PAUSE on SVM, the relevant intercepts aren't updated after
vCPU creation.  vCPUs may also end up with an inconsistent configuration
if exits are disabled between creation of multiple vCPUs.

Cc: Hou Wenlong <houwenlong.hwl@antgroup.com>
Link: https://lore.kernel.org/all/9227068821b275ac547eb2ede09ec65d2281fe07.1680179693.git.houwenlong.hwl@antgroup.com
Link: https://lore.kernel.org/all/20230121020738.2973-2-kechenl@nvidia.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/api.rst | 1 +
 arch/x86/kvm/x86.c             | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 6ab8b5b7c64e..884846282d06 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7645,6 +7645,7 @@ branch to guests' 0x200 interrupt vector.
 :Architectures: x86
 :Parameters: args[0] defines which exits are disabled
 :Returns: 0 on success, -EINVAL when args[0] contains invalid exits
+          or if any vCPUs have already been created
 
 Valid bits in args[0] are::
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bb34891d2f0a..4cb0c150a2f8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6568,6 +6568,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
 			break;
 
+		mutex_lock(&kvm->lock);
+		if (kvm->created_vcpus)
+			goto disable_exits_unlock;
+
 		if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
 			kvm->arch.pause_in_guest = true;
 
@@ -6589,6 +6593,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		}
 
 		r = 0;
+disable_exits_unlock:
+		mutex_unlock(&kvm->lock);
 		break;
 	case KVM_CAP_MSR_PLATFORM_INFO:
 		kvm->arch.guest_can_read_msr_platform_info = cap->args[0];
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (10 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 11/49] KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after " Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-05-22  5:09   ` Binbin Wu
                     ` (2 more replies)
  2024-05-17 17:38 ` [PATCH v2 13/49] KVM: selftests: Fix a bad TEST_REQUIRE() in x86's KVM PV test Sean Christopherson
                   ` (37 subsequent siblings)
  49 siblings, 3 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Reject KVM_CAP_X86_DISABLE_EXITS if userspace attempts to disable MWAIT or
HLT exits and KVM previously reported (via KVM_CHECK_EXTENSION) that
disabling the exit(s) is not allowed.  E.g. because MWAIT isn't supported
or the CPU doesn't have an aways-running APIC timer, or because KVM is
configured to mitigate cross-thread vulnerabilities.

Cc: Kechen Lu <kechenl@nvidia.com>
Fixes: 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT intercepts")
Fixes: 6f0f2d5ef895 ("KVM: x86: Mitigate the cross-thread return address predictions bug")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 54 ++++++++++++++++++++++++----------------------
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4cb0c150a2f8..c729227c6501 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4590,6 +4590,20 @@ static inline bool kvm_can_mwait_in_guest(void)
 		boot_cpu_has(X86_FEATURE_ARAT);
 }
 
+static u64 kvm_get_allowed_disable_exits(void)
+{
+	u64 r = KVM_X86_DISABLE_EXITS_PAUSE;
+
+	if (!mitigate_smt_rsb) {
+		r |= KVM_X86_DISABLE_EXITS_HLT |
+			KVM_X86_DISABLE_EXITS_CSTATE;
+
+		if (kvm_can_mwait_in_guest())
+			r |= KVM_X86_DISABLE_EXITS_MWAIT;
+	}
+	return r;
+}
+
 #ifdef CONFIG_KVM_HYPERV
 static int kvm_ioctl_get_supported_hv_cpuid(struct kvm_vcpu *vcpu,
 					    struct kvm_cpuid2 __user *cpuid_arg)
@@ -4726,15 +4740,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = KVM_CLOCK_VALID_FLAGS;
 		break;
 	case KVM_CAP_X86_DISABLE_EXITS:
-		r = KVM_X86_DISABLE_EXITS_PAUSE;
-
-		if (!mitigate_smt_rsb) {
-			r |= KVM_X86_DISABLE_EXITS_HLT |
-			     KVM_X86_DISABLE_EXITS_CSTATE;
-
-			if (kvm_can_mwait_in_guest())
-				r |= KVM_X86_DISABLE_EXITS_MWAIT;
-		}
+		r |= kvm_get_allowed_disable_exits();
 		break;
 	case KVM_CAP_X86_SMM:
 		if (!IS_ENABLED(CONFIG_KVM_SMM))
@@ -6565,33 +6571,29 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		break;
 	case KVM_CAP_X86_DISABLE_EXITS:
 		r = -EINVAL;
-		if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
+		if (cap->args[0] & ~kvm_get_allowed_disable_exits())
 			break;
 
 		mutex_lock(&kvm->lock);
 		if (kvm->created_vcpus)
 			goto disable_exits_unlock;
 
-		if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
-			kvm->arch.pause_in_guest = true;
-
 #define SMT_RSB_MSG "This processor is affected by the Cross-Thread Return Predictions vulnerability. " \
 		    "KVM_CAP_X86_DISABLE_EXITS should only be used with SMT disabled or trusted guests."
 
-		if (!mitigate_smt_rsb) {
-			if (boot_cpu_has_bug(X86_BUG_SMT_RSB) && cpu_smt_possible() &&
-			    (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE))
-				pr_warn_once(SMT_RSB_MSG);
-
-			if ((cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT) &&
-			    kvm_can_mwait_in_guest())
-				kvm->arch.mwait_in_guest = true;
-			if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
-				kvm->arch.hlt_in_guest = true;
-			if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
-				kvm->arch.cstate_in_guest = true;
-		}
+		if (!mitigate_smt_rsb && boot_cpu_has_bug(X86_BUG_SMT_RSB) &&
+		    cpu_smt_possible() &&
+		    (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE))
+			pr_warn_once(SMT_RSB_MSG);
 
+		if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
+			kvm->arch.pause_in_guest = true;
+		if (cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT)
+			kvm->arch.mwait_in_guest = true;
+		if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
+			kvm->arch.hlt_in_guest = true;
+		if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
+			kvm->arch.cstate_in_guest = true;
 		r = 0;
 disable_exits_unlock:
 		mutex_unlock(&kvm->lock);
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 13/49] KVM: selftests: Fix a bad TEST_REQUIRE() in x86's KVM PV test
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (11 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  1:17   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 14/49] KVM: selftests: Update x86's KVM PV test to match KVM's disabling exits behavior Sean Christopherson
                   ` (36 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Actually check for KVM support for disabling HLT-exiting instead of
effectively checking that KVM_CAP_X86_DISABLE_EXITS is #defined to a
non-zero value, and convert the TEST_REQUIRE() to a simple return so
that only the sub-test is skipped if HLT-exiting is mandatory.

The goof has likely gone unnoticed because all x86 CPUs support disabling
HLT-exiting, only systems with the opt-in mitigate_smt_rsb KVM module
param disallow HLT-exiting.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/x86_64/kvm_pv_test.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c b/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
index 78878b3a2725..2aee93108a54 100644
--- a/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
+++ b/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
@@ -140,10 +140,11 @@ static void test_pv_unhalt(void)
 	struct kvm_cpuid_entry2 *ent;
 	u32 kvm_sig_old;
 
+	if (!(kvm_check_cap(KVM_CAP_X86_DISABLE_EXITS) & KVM_X86_DISABLE_EXITS_HLT))
+		return;
+
 	pr_info("testing KVM_FEATURE_PV_UNHALT\n");
 
-	TEST_REQUIRE(KVM_CAP_X86_DISABLE_EXITS);
-
 	/* KVM_PV_UNHALT test */
 	vm = vm_create_with_one_vcpu(&vcpu, guest_main);
 	vcpu_set_cpuid_feature(vcpu, X86_FEATURE_KVM_PV_UNHALT);
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 14/49] KVM: selftests: Update x86's KVM PV test to match KVM's disabling exits behavior
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (12 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 13/49] KVM: selftests: Fix a bad TEST_REQUIRE() in x86's KVM PV test Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  1:17   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 15/49] KVM: x86: Zero out PV features cache when the CPUID leaf is not present Sean Christopherson
                   ` (35 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Rework x86's KVM PV features test to align with KVM's new, fixed behavior
of not allowing userspace to disable HLT-exiting after vCPUs have been
created.  Rework the core testcase to disable HLT-exiting before creating
a vCPU, and opportunistically modify keep the paired VM+vCPU creation to
verify that KVM rejects KVM_CAP_X86_DISABLE_EXITS as expected.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../selftests/kvm/x86_64/kvm_pv_test.c        | 33 +++++++++++++++++--
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c b/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
index 2aee93108a54..1b805cbdb47b 100644
--- a/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
+++ b/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
@@ -139,6 +139,7 @@ static void test_pv_unhalt(void)
 	struct kvm_vm *vm;
 	struct kvm_cpuid_entry2 *ent;
 	u32 kvm_sig_old;
+	int r;
 
 	if (!(kvm_check_cap(KVM_CAP_X86_DISABLE_EXITS) & KVM_X86_DISABLE_EXITS_HLT))
 		return;
@@ -152,19 +153,45 @@ static void test_pv_unhalt(void)
 	TEST_ASSERT(vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT),
 		    "Enabling X86_FEATURE_KVM_PV_UNHALT had no effect");
 
-	/* Make sure KVM clears vcpu->arch.kvm_cpuid */
+	/* Verify KVM disallows disabling exits after vCPU creation. */
+	r = __vm_enable_cap(vm, KVM_CAP_X86_DISABLE_EXITS, KVM_X86_DISABLE_EXITS_HLT);
+	TEST_ASSERT(r && errno == EINVAL,
+		    "Disabling exits after vCPU creation didn't fail as expected");
+
+	kvm_vm_free(vm);
+
+	/* Verify that KVM clear PV_UNHALT from guest CPUID. */
+	vm = vm_create(1);
+	vm_enable_cap(vm, KVM_CAP_X86_DISABLE_EXITS, KVM_X86_DISABLE_EXITS_HLT);
+
+	vcpu = vm_vcpu_add(vm, 0, NULL);
+	TEST_ASSERT(!vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT),
+		    "vCPU created with PV_UNHALT set by default");
+
+	vcpu_set_cpuid_feature(vcpu, X86_FEATURE_KVM_PV_UNHALT);
+	TEST_ASSERT(!vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT),
+		    "PV_UNHALT set in guest CPUID when HLT-exiting is disabled");
+
+	/*
+	 * Clobber the KVM PV signature and verify KVM does NOT clear PV_UNHALT
+	 * when KVM PV is not present, and DOES clear PV_UNHALT when switching
+	 * back to the correct signature..
+	 */
 	ent = vcpu_get_cpuid_entry(vcpu, KVM_CPUID_SIGNATURE);
 	kvm_sig_old = ent->ebx;
 	ent->ebx = 0xdeadbeef;
 	vcpu_set_cpuid(vcpu);
 
-	vm_enable_cap(vm, KVM_CAP_X86_DISABLE_EXITS, KVM_X86_DISABLE_EXITS_HLT);
+	vcpu_set_cpuid_feature(vcpu, X86_FEATURE_KVM_PV_UNHALT);
+	TEST_ASSERT(vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT),
+		    "PV_UNHALT cleared when using bogus KVM PV signature");
+
 	ent = vcpu_get_cpuid_entry(vcpu, KVM_CPUID_SIGNATURE);
 	ent->ebx = kvm_sig_old;
 	vcpu_set_cpuid(vcpu);
 
 	TEST_ASSERT(!vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT),
-		    "KVM_FEATURE_PV_UNHALT is set with KVM_CAP_X86_DISABLE_EXITS");
+		    "PV_UNHALT set in guest CPUID when HLT-exiting is disabled");
 
 	/* FIXME: actually test KVM_FEATURE_PV_UNHALT feature */
 
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 15/49] KVM: x86: Zero out PV features cache when the CPUID leaf is not present
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (13 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 14/49] KVM: selftests: Update x86's KVM PV test to match KVM's disabling exits behavior Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  1:17   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 16/49] KVM: x86: Don't update PV features caches when enabling enforcement capability Sean Christopherson
                   ` (34 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Clear KVM's PV feature cache prior when processing a new guest CPUID so
that KVM doesn't keep a stale cache entry if userspace does KVM_SET_CPUID2
multiple times, once with a PV features entry, and a second time without.

Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID")
Cc: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f756a91a3f2f..be1c8f43e090 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -246,6 +246,8 @@ void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best = kvm_find_kvm_cpuid_features(vcpu);
 
+	vcpu->arch.pv_cpuid.features = 0;
+
 	/*
 	 * save the feature bitmap to avoid cpuid lookup for every PV
 	 * operation
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 16/49] KVM: x86: Don't update PV features caches when enabling enforcement capability
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (14 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 15/49] KVM: x86: Zero out PV features cache when the CPUID leaf is not present Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  1:17   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 17/49] KVM: x86: Do reverse CPUID sanity checks in __feature_leaf() Sean Christopherson
                   ` (33 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Revert the chunk of commit 01b4f510b9f4 ("kvm: x86: ensure pv_cpuid.features
is initialized when enabling cap") that forced a PV features cache refresh
during KVM_CAP_ENFORCE_PV_FEATURE_CPUID, as whatever ioctl() ordering
issue it alleged to have fixed never existed upstream, and likely never
existed in any kernel.

At the time of the commit, there was a tangentially related ioctl()
ordering issue, as toggling KVM_X86_DISABLE_EXITS_HLT after KVM_SET_CPUID2
would have resulted in KVM potentially leaving KVM_FEATURE_PV_UNHALT set.
But (a) that bug affected the entire guest CPUID, not just the cache, (b)
commit 01b4f510b9f4 didn't address that bug, it only refreshed the cache
(with the bad CPUID), and (c) setting KVM_X86_DISABLE_EXITS_HLT after vCPU
creation is completely broken as KVM configures HLT-exiting only during
vCPU creation, which is why KVM_CAP_X86_DISABLE_EXITS is now disallowed if
vCPUs have been created.

Another tangentially related bug was KVM's failure to clear the cache when
handling KVM_SET_CPUID2, but again commit 01b4f510b9f4 did nothing to fix
that bug.

The most plausible explanation for the what commit 01b4f510b9f4 was trying
to fix is a bug that existed in Google's internal kernel that was the
source of commit 01b4f510b9f4.  At the time, Google's internal kernel had
not yet picked up commit 0d3b2ba16ba68 ("KVM: X86: Go on updating other
CPUID leaves when leaf 1 is absent"), i.e. KVM would not initialize the
PV features cache if KVM_SET_CPUID2 was called without a CPUID.0x1 entry.

Of course, no sane real world VMM would omit CPUID.0x1, including the KVM
selftest added by commit ac4a4d6de22e ("selftests: kvm: test enforcement
of paravirtual cpuid features").  And the test didn't actually try to
verify multiple orderings, nor did the selftest enter the guest without
doing KVM_SET_CPUID2, so who knows what motivated the change.

Regardless of why commit 01b4f510b9f4 ("kvm: x86: ensure pv_cpuid.features
is initialized when enabling cap") was added, refreshing the cache during
KVM_CAP_ENFORCE_PV_FEATURE_CPUID isn't necessary.

Cc: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 2 +-
 arch/x86/kvm/cpuid.h | 1 -
 arch/x86/kvm/x86.c   | 3 ---
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index be1c8f43e090..a51e48663f53 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -242,7 +242,7 @@ static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcp
 					     vcpu->arch.cpuid_nent, base);
 }
 
-void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
+static void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best = kvm_find_kvm_cpuid_features(vcpu);
 
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 0a8b561b5434..7eb3d7318fc4 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -13,7 +13,6 @@ void kvm_set_cpu_caps(void);
 
 void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
 void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
-void kvm_update_pv_runtime(struct kvm_vcpu *vcpu);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
 						    u32 function, u32 index);
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c729227c6501..7160c5ab8e3e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5849,9 +5849,6 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 
 	case KVM_CAP_ENFORCE_PV_FEATURE_CPUID:
 		vcpu->arch.pv_cpuid.enforce = cap->args[0];
-		if (vcpu->arch.pv_cpuid.enforce)
-			kvm_update_pv_runtime(vcpu);
-
 		return 0;
 	default:
 		return -EINVAL;
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 17/49] KVM: x86: Do reverse CPUID sanity checks in __feature_leaf()
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (15 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 16/49] KVM: x86: Don't update PV features caches when enabling enforcement capability Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  1:17   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 18/49] KVM: x86: Account for max supported CPUID leaf when getting raw host CPUID Sean Christopherson
                   ` (32 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Do the compile-time sanity checks on reverse_cpuid in __feature_leaf() so
that higher level APIs don't need to "manually" perform the sanity checks.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.h         | 3 ---
 arch/x86/kvm/reverse_cpuid.h | 6 ++++--
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 7eb3d7318fc4..d68b7d879820 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -198,7 +198,6 @@ static __always_inline void kvm_cpu_cap_clear(unsigned int x86_feature)
 {
 	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
-	reverse_cpuid_check(x86_leaf);
 	kvm_cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature);
 }
 
@@ -206,7 +205,6 @@ static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature)
 {
 	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
-	reverse_cpuid_check(x86_leaf);
 	kvm_cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
 }
 
@@ -214,7 +212,6 @@ static __always_inline u32 kvm_cpu_cap_get(unsigned int x86_feature)
 {
 	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
-	reverse_cpuid_check(x86_leaf);
 	return kvm_cpu_caps[x86_leaf] & __feature_bit(x86_feature);
 }
 
diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
index 2f4e155080ba..245f71c16272 100644
--- a/arch/x86/kvm/reverse_cpuid.h
+++ b/arch/x86/kvm/reverse_cpuid.h
@@ -136,7 +136,10 @@ static __always_inline u32 __feature_translate(int x86_feature)
 
 static __always_inline u32 __feature_leaf(int x86_feature)
 {
-	return __feature_translate(x86_feature) / 32;
+	u32 x86_leaf = __feature_translate(x86_feature) / 32;
+
+	reverse_cpuid_check(x86_leaf);
+	return x86_leaf;
 }
 
 /*
@@ -159,7 +162,6 @@ static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned int x86_featu
 {
 	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
-	reverse_cpuid_check(x86_leaf);
 	return reverse_cpuid[x86_leaf];
 }
 
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 18/49] KVM: x86: Account for max supported CPUID leaf when getting raw host CPUID
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (16 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 17/49] KVM: x86: Do reverse CPUID sanity checks in __feature_leaf() Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-06-19  6:17   ` Yang, Weijiang
  2024-07-05  1:17   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 19/49] KVM: x86: Add a macro to init CPUID features that ignore host kernel support Sean Christopherson
                   ` (31 subsequent siblings)
  49 siblings, 2 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Explicitly zero out the feature word in kvm_cpu_caps if the word's
associated CPUID function is greater than the max leaf supported by the
CPU.  For such unsupported functions, Intel CPUs return the output from
the last supported leaf, not all zeros.

Practically speaking, this is likely a benign bug, as KVM uses the raw
host CPUID to mask the kernel's computed capabilities, and the kernel does
perform max leaf checks when populating boot_cpu_data.  The only way KVM's
goof could be problematic is if the kernel force-set a feature in a leaf
that is completely unsupported, _and_ the max supported leaf happened to
return a value with '1' the same bit position.  Which is theoretically
possible, but extremely unlikely.  And even if that did happen, it's
entirely possible that KVM would still provide the correct functionality;
the kernel did set the capability after all.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a51e48663f53..77625a5477b1 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -571,18 +571,37 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
+static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid)
+{
+	struct kvm_cpuid_entry2 entry;
+	u32 base;
+
+	/*
+	 * KVM only supports features defined by Intel (0x0), AMD (0x80000000),
+	 * and Centaur (0xc0000000).  WARN if a feature for new vendor base is
+	 * defined, as this and other code would need to be updated.
+	 */
+	base = cpuid.function & 0xffff0000;
+	if (WARN_ON_ONCE(base && base != 0x80000000 && base != 0xc0000000))
+		return 0;
+
+	if (cpuid_eax(base) < cpuid.function)
+		return 0;
+
+	cpuid_count(cpuid.function, cpuid.index,
+		    &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);
+
+	return *__cpuid_entry_get_reg(&entry, cpuid.reg);
+}
+
 /* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */
 static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
 {
 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
-	struct kvm_cpuid_entry2 entry;
 
 	reverse_cpuid_check(leaf);
 
-	cpuid_count(cpuid.function, cpuid.index,
-		    &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);
-
-	kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);
+	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
 }
 
 static __always_inline
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 19/49] KVM: x86: Add a macro to init CPUID features that ignore host kernel support
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (17 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 18/49] KVM: x86: Account for max supported CPUID leaf when getting raw host CPUID Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  1:21   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 20/49] KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init() Sean Christopherson
                   ` (30 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Add a macro for use in kvm_set_cpu_caps() to automagically initialize
features that KVM wants to support based solely on the CPU's capabilities,
e.g. KVM advertises LA57 support if it's available in hardware, even if
the host kernel isn't utilizing 57-bit virtual addresses.

Take advantage of the fact that kvm_cpu_cap_mask() adjusts kvm_cpu_caps
based on raw CPUID, i.e. will clear features bits that aren't supported in
hardware, and simply force-set the capability before applying the mask.

Abusing kvm_cpu_cap_set() is a borderline evil shenanigan, but doing so
avoid extra CPUID lookups, and a future commit will harden the entire
family of *F() macros to assert (at compile time) that every feature being
allowed is part of the capability word being processed, i.e. using a macro
will bring more advantages in the future.

Avoiding CPUID also fixes a largely benign bug where KVM could incorrectly
report LA57 support on Intel CPUs whose max supported CPUID is less than 7,
i.e. if the max supported leaf (<7) happened to have bit 16 set.  In
practice, barring a funky virtual machine setup, the bug is benign as all
known CPUs that support VMX also support leaf 7.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 77625a5477b1..a802c09b50ab 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -70,6 +70,18 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
 	(boot_cpu_has(X86_FEATURE_##name) ? F(name) : 0);	\
 })
 
+/*
+ * Raw Feature - For features that KVM supports based purely on raw host CPUID,
+ * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
+ * Simply force set the feature in KVM's capabilities, raw CPUID support will
+ * be factored in by kvm_cpu_cap_mask().
+ */
+#define RAW_F(name)						\
+({								\
+	kvm_cpu_cap_set(X86_FEATURE_##name);			\
+	F(name);						\
+})
+
 /*
  * Magic value used by KVM when querying userspace-provided CPUID entries and
  * doesn't care about the CPIUD index because the index of the function in
@@ -682,15 +694,12 @@ void kvm_set_cpu_caps(void)
 		F(AVX512VL));
 
 	kvm_cpu_cap_mask(CPUID_7_ECX,
-		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
+		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
 		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
 		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
 		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
 		F(SGX_LC) | F(BUS_LOCK_DETECT)
 	);
-	/* Set LA57 based on hardware capability. */
-	if (cpuid_ecx(7) & F(LA57))
-		kvm_cpu_cap_set(X86_FEATURE_LA57);
 
 	/*
 	 * PKU not yet implemented for shadow paging and requires OSPKE
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 20/49] KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init()
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (18 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 19/49] KVM: x86: Add a macro to init CPUID features that ignore host kernel support Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-05-22  6:23   ` Binbin Wu
  2024-07-05  1:24   ` Maxim Levitsky
  2024-05-17 17:38 ` [PATCH v2 21/49] KVM: x86: Add a macro to init CPUID features that are 64-bit only Sean Christopherson
                   ` (29 subsequent siblings)
  49 siblings, 2 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init() in anticipation of merging
it with kvm_cpu_cap_init_kvm_defined(), and in anticipation of _setting_
bits in the helper (a future commit will play macro games to set emulated
feature flags via kvm_cpu_cap_init()).

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 36 ++++++++++++++++++------------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a802c09b50ab..5a4d6138c4f1 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -74,7 +74,7 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
  * Raw Feature - For features that KVM supports based purely on raw host CPUID,
  * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
  * Simply force set the feature in KVM's capabilities, raw CPUID support will
- * be factored in by kvm_cpu_cap_mask().
+ * be factored in by __kvm_cpu_cap_mask().
  */
 #define RAW_F(name)						\
 ({								\
@@ -619,7 +619,7 @@ static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
 static __always_inline
 void kvm_cpu_cap_init_kvm_defined(enum kvm_only_cpuid_leafs leaf, u32 mask)
 {
-	/* Use kvm_cpu_cap_mask for leafs that aren't KVM-only. */
+	/* Use kvm_cpu_cap_init for leafs that aren't KVM-only. */
 	BUILD_BUG_ON(leaf < NCAPINTS);
 
 	kvm_cpu_caps[leaf] = mask;
@@ -627,7 +627,7 @@ void kvm_cpu_cap_init_kvm_defined(enum kvm_only_cpuid_leafs leaf, u32 mask)
 	__kvm_cpu_cap_mask(leaf);
 }
 
-static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
+static __always_inline void kvm_cpu_cap_init(enum cpuid_leafs leaf, u32 mask)
 {
 	/* Use kvm_cpu_cap_init_kvm_defined for KVM-only leafs. */
 	BUILD_BUG_ON(leaf >= NCAPINTS);
@@ -656,7 +656,7 @@ void kvm_set_cpu_caps(void)
 	memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
 	       sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)));
 
-	kvm_cpu_cap_mask(CPUID_1_ECX,
+	kvm_cpu_cap_init(CPUID_1_ECX,
 		/*
 		 * NOTE: MONITOR (and MWAIT) are emulated as NOP, but *not*
 		 * advertised to guests via CPUID!
@@ -673,7 +673,7 @@ void kvm_set_cpu_caps(void)
 	/* KVM emulates x2apic in software irrespective of host support. */
 	kvm_cpu_cap_set(X86_FEATURE_X2APIC);
 
-	kvm_cpu_cap_mask(CPUID_1_EDX,
+	kvm_cpu_cap_init(CPUID_1_EDX,
 		F(FPU) | F(VME) | F(DE) | F(PSE) |
 		F(TSC) | F(MSR) | F(PAE) | F(MCE) |
 		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SEP) |
@@ -684,7 +684,7 @@ void kvm_set_cpu_caps(void)
 		0 /* HTT, TM, Reserved, PBE */
 	);
 
-	kvm_cpu_cap_mask(CPUID_7_0_EBX,
+	kvm_cpu_cap_init(CPUID_7_0_EBX,
 		F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) |
 		F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) | F(INVPCID) |
 		F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ | F(AVX512F) |
@@ -693,7 +693,7 @@ void kvm_set_cpu_caps(void)
 		F(AVX512ER) | F(AVX512CD) | F(SHA_NI) | F(AVX512BW) |
 		F(AVX512VL));
 
-	kvm_cpu_cap_mask(CPUID_7_ECX,
+	kvm_cpu_cap_init(CPUID_7_ECX,
 		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
 		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
 		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
@@ -708,7 +708,7 @@ void kvm_set_cpu_caps(void)
 	if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
 		kvm_cpu_cap_clear(X86_FEATURE_PKU);
 
-	kvm_cpu_cap_mask(CPUID_7_EDX,
+	kvm_cpu_cap_init(CPUID_7_EDX,
 		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
 		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
 		F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
@@ -727,7 +727,7 @@ void kvm_set_cpu_caps(void)
 	if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
 		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);
 
-	kvm_cpu_cap_mask(CPUID_7_1_EAX,
+	kvm_cpu_cap_init(CPUID_7_1_EAX,
 		F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) |
 		F(FZRM) | F(FSRS) | F(FSRC) |
 		F(AMX_FP16) | F(AVX_IFMA) | F(LAM)
@@ -743,7 +743,7 @@ void kvm_set_cpu_caps(void)
 		F(BHI_CTRL) | F(MCDT_NO)
 	);
 
-	kvm_cpu_cap_mask(CPUID_D_1_EAX,
+	kvm_cpu_cap_init(CPUID_D_1_EAX,
 		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) | f_xfd
 	);
 
@@ -751,7 +751,7 @@ void kvm_set_cpu_caps(void)
 		SF(SGX1) | SF(SGX2) | SF(SGX_EDECCSSA)
 	);
 
-	kvm_cpu_cap_mask(CPUID_8000_0001_ECX,
+	kvm_cpu_cap_init(CPUID_8000_0001_ECX,
 		F(LAHF_LM) | F(CMP_LEGACY) | 0 /*SVM*/ | 0 /* ExtApicSpace */ |
 		F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
 		F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
@@ -759,7 +759,7 @@ void kvm_set_cpu_caps(void)
 		F(TOPOEXT) | 0 /* PERFCTR_CORE */
 	);
 
-	kvm_cpu_cap_mask(CPUID_8000_0001_EDX,
+	kvm_cpu_cap_init(CPUID_8000_0001_EDX,
 		F(FPU) | F(VME) | F(DE) | F(PSE) |
 		F(TSC) | F(MSR) | F(PAE) | F(MCE) |
 		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
@@ -777,7 +777,7 @@ void kvm_set_cpu_caps(void)
 		SF(CONSTANT_TSC)
 	);
 
-	kvm_cpu_cap_mask(CPUID_8000_0008_EBX,
+	kvm_cpu_cap_init(CPUID_8000_0008_EBX,
 		F(CLZERO) | F(XSAVEERPTR) |
 		F(WBNOINVD) | F(AMD_IBPB) | F(AMD_IBRS) | F(AMD_SSBD) | F(VIRT_SSBD) |
 		F(AMD_SSB_NO) | F(AMD_STIBP) | F(AMD_STIBP_ALWAYS_ON) |
@@ -811,13 +811,13 @@ void kvm_set_cpu_caps(void)
 	 * Hide all SVM features by default, SVM will set the cap bits for
 	 * features it emulates and/or exposes for L1.
 	 */
-	kvm_cpu_cap_mask(CPUID_8000_000A_EDX, 0);
+	kvm_cpu_cap_init(CPUID_8000_000A_EDX, 0);
 
-	kvm_cpu_cap_mask(CPUID_8000_001F_EAX,
+	kvm_cpu_cap_init(CPUID_8000_001F_EAX,
 		0 /* SME */ | 0 /* SEV */ | 0 /* VM_PAGE_FLUSH */ | 0 /* SEV_ES */ |
 		F(SME_COHERENT));
 
-	kvm_cpu_cap_mask(CPUID_8000_0021_EAX,
+	kvm_cpu_cap_init(CPUID_8000_0021_EAX,
 		F(NO_NESTED_DATA_BP) | F(LFENCE_RDTSC) | 0 /* SmmPgCfgLock */ |
 		F(NULL_SEL_CLR_BASE) | F(AUTOIBRS) | 0 /* PrefetchCtlMsr */ |
 		F(WRMSR_XX_BASE_NS)
@@ -837,7 +837,7 @@ void kvm_set_cpu_caps(void)
 	 * kernel.  LFENCE_RDTSC was a Linux-defined synthetic feature long
 	 * before AMD joined the bandwagon, e.g. LFENCE is serializing on most
 	 * CPUs that support SSE2.  On CPUs that don't support AMD's leaf,
-	 * kvm_cpu_cap_mask() will unfortunately drop the flag due to ANDing
+	 * kvm_cpu_cap_init() will unfortunately drop the flag due to ANDing
 	 * the mask with the raw host CPUID, and reporting support in AMD's
 	 * leaf can make it easier for userspace to detect the feature.
 	 */
@@ -847,7 +847,7 @@ void kvm_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_NULL_SEL_CLR_BASE);
 	kvm_cpu_cap_set(X86_FEATURE_NO_SMM_CTL_MSR);
 
-	kvm_cpu_cap_mask(CPUID_C000_0001_EDX,
+	kvm_cpu_cap_init(CPUID_C000_0001_EDX,
 		F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
 		F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
 		F(PMM) | F(PMM_EN)
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 21/49] KVM: x86: Add a macro to init CPUID features that are 64-bit only
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (19 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 20/49] KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init() Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  1:24   ` Maxim Levitsky
  2024-07-17 13:31   ` Xiaoyao Li
  2024-05-17 17:38 ` [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features Sean Christopherson
                   ` (28 subsequent siblings)
  49 siblings, 2 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Add a macro to mask-in feature flags that are supported only on 64-bit
kernels/KVM.  In addition to reducing overall #ifdeffery, using a macro
will allow hardening the kvm_cpu_cap initialization sequences to assert
that the features being advertised are indeed included in the word being
initialized.  And arguably using *F() macros through is more readable.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 5a4d6138c4f1..5e3b97d06374 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -70,6 +70,12 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
 	(boot_cpu_has(X86_FEATURE_##name) ? F(name) : 0);	\
 })
 
+/* Features that KVM supports only on 64-bit kernels. */
+#define X86_64_F(name)						\
+({								\
+	(IS_ENABLED(CONFIG_X86_64) ? F(name) : 0);		\
+})
+
 /*
  * Raw Feature - For features that KVM supports based purely on raw host CPUID,
  * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
@@ -639,15 +645,6 @@ static __always_inline void kvm_cpu_cap_init(enum cpuid_leafs leaf, u32 mask)
 
 void kvm_set_cpu_caps(void)
 {
-#ifdef CONFIG_X86_64
-	unsigned int f_gbpages = F(GBPAGES);
-	unsigned int f_lm = F(LM);
-	unsigned int f_xfd = F(XFD);
-#else
-	unsigned int f_gbpages = 0;
-	unsigned int f_lm = 0;
-	unsigned int f_xfd = 0;
-#endif
 	memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));
 
 	BUILD_BUG_ON(sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)) >
@@ -744,7 +741,8 @@ void kvm_set_cpu_caps(void)
 	);
 
 	kvm_cpu_cap_init(CPUID_D_1_EAX,
-		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) | f_xfd
+		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) |
+		X86_64_F(XFD)
 	);
 
 	kvm_cpu_cap_init_kvm_defined(CPUID_12_EAX,
@@ -766,8 +764,8 @@ void kvm_set_cpu_caps(void)
 		F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
 		F(PAT) | F(PSE36) | 0 /* Reserved */ |
 		F(NX) | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
-		F(FXSR) | F(FXSR_OPT) | f_gbpages | F(RDTSCP) |
-		0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW)
+		F(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
+		0 /* Reserved */ | X86_64_F(LM) | F(3DNOWEXT) | F(3DNOW)
 	);
 
 	if (!tdp_enabled && IS_ENABLED(CONFIG_X86_64))
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (20 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 21/49] KVM: x86: Add a macro to init CPUID features that are 64-bit only Sean Christopherson
@ 2024-05-17 17:38 ` Sean Christopherson
  2024-07-05  1:25   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 23/49] KVM: x86: Handle kernel- and KVM-defined CPUID words in a single helper Sean Christopherson
                   ` (27 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:38 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Add a macro to precisely handle CPUID features that AMD duplicated from
CPUID.0x1.EDX into CPUID.0x8000_0001.EDX.  This will allow adding an
assert that all features passed to kvm_cpu_cap_init() match the word being
processed, e.g. to prevent passing a feature from CPUID 0x7 to CPUID 0x1.

Because the kernel simply reuses the X86_FEATURE_* definitions from
CPUID.0x1.EDX, KVM's use of the aliased features would result in false
positives from such an assert.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 5e3b97d06374..f2bd2f5c4ea3 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -88,6 +88,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
 	F(name);						\
 })
 
+/*
+ * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
+ * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
+ */
+#define AF(name)								\
+({										\
+	BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);	\
+	feature_bit(name);							\
+})
+
 /*
  * Magic value used by KVM when querying userspace-provided CPUID entries and
  * doesn't care about the CPIUD index because the index of the function in
@@ -758,13 +768,13 @@ void kvm_set_cpu_caps(void)
 	);
 
 	kvm_cpu_cap_init(CPUID_8000_0001_EDX,
-		F(FPU) | F(VME) | F(DE) | F(PSE) |
-		F(TSC) | F(MSR) | F(PAE) | F(MCE) |
-		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
-		F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
-		F(PAT) | F(PSE36) | 0 /* Reserved */ |
-		F(NX) | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
-		F(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
+		AF(FPU) | AF(VME) | AF(DE) | AF(PSE) |
+		AF(TSC) | AF(MSR) | AF(PAE) | AF(MCE) |
+		AF(CX8) | AF(APIC) | 0 /* Reserved */ | F(SYSCALL) |
+		AF(MTRR) | AF(PGE) | AF(MCA) | AF(CMOV) |
+		AF(PAT) | AF(PSE36) | 0 /* Reserved */ |
+		F(NX) | 0 /* Reserved */ | F(MMXEXT) | AF(MMX) |
+		AF(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
 		0 /* Reserved */ | X86_64_F(LM) | F(3DNOWEXT) | F(3DNOW)
 	);
 
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 23/49] KVM: x86: Handle kernel- and KVM-defined CPUID words in a single helper
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (21 preceding siblings ...)
  2024-05-17 17:38 ` [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  1:28   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 24/49] KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions Sean Christopherson
                   ` (26 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Merge kvm_cpu_cap_init() and kvm_cpu_cap_init_kvm_defined() into a single
helper.  The only advantage of separating the two was to make it somewhat
obvious that KVM directly initializes the KVM-defined words, whereas using
a common helper will allow for hardening both kernel- and KVM-defined
CPUID words without needing copy+paste.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 44 +++++++++++++++-----------------------------
 1 file changed, 15 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f2bd2f5c4ea3..8efffd48cdf1 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -622,37 +622,23 @@ static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid)
 	return *__cpuid_entry_get_reg(&entry, cpuid.reg);
 }
 
-/* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */
-static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
+static __always_inline void kvm_cpu_cap_init(u32 leaf, u32 mask)
 {
 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
 
-	reverse_cpuid_check(leaf);
+	/*
+	 * For kernel-defined leafs, mask the boot CPU's pre-populated value.
+	 * For KVM-defined leafs, explicitly set the leaf, as KVM is the one
+	 * and only authority.
+	 */
+	if (leaf < NCAPINTS)
+		kvm_cpu_caps[leaf] &= mask;
+	else
+		kvm_cpu_caps[leaf] = mask;
 
 	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
 }
 
-static __always_inline
-void kvm_cpu_cap_init_kvm_defined(enum kvm_only_cpuid_leafs leaf, u32 mask)
-{
-	/* Use kvm_cpu_cap_init for leafs that aren't KVM-only. */
-	BUILD_BUG_ON(leaf < NCAPINTS);
-
-	kvm_cpu_caps[leaf] = mask;
-
-	__kvm_cpu_cap_mask(leaf);
-}
-
-static __always_inline void kvm_cpu_cap_init(enum cpuid_leafs leaf, u32 mask)
-{
-	/* Use kvm_cpu_cap_init_kvm_defined for KVM-only leafs. */
-	BUILD_BUG_ON(leaf >= NCAPINTS);
-
-	kvm_cpu_caps[leaf] &= mask;
-
-	__kvm_cpu_cap_mask(leaf);
-}
-
 void kvm_set_cpu_caps(void)
 {
 	memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));
@@ -740,12 +726,12 @@ void kvm_set_cpu_caps(void)
 		F(AMX_FP16) | F(AVX_IFMA) | F(LAM)
 	);
 
-	kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
+	kvm_cpu_cap_init(CPUID_7_1_EDX,
 		F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) |
 		F(AMX_COMPLEX)
 	);
 
-	kvm_cpu_cap_init_kvm_defined(CPUID_7_2_EDX,
+	kvm_cpu_cap_init(CPUID_7_2_EDX,
 		F(INTEL_PSFD) | F(IPRED_CTRL) | F(RRSBA_CTRL) | F(DDPD_U) |
 		F(BHI_CTRL) | F(MCDT_NO)
 	);
@@ -755,7 +741,7 @@ void kvm_set_cpu_caps(void)
 		X86_64_F(XFD)
 	);
 
-	kvm_cpu_cap_init_kvm_defined(CPUID_12_EAX,
+	kvm_cpu_cap_init(CPUID_12_EAX,
 		SF(SGX1) | SF(SGX2) | SF(SGX_EDECCSSA)
 	);
 
@@ -781,7 +767,7 @@ void kvm_set_cpu_caps(void)
 	if (!tdp_enabled && IS_ENABLED(CONFIG_X86_64))
 		kvm_cpu_cap_set(X86_FEATURE_GBPAGES);
 
-	kvm_cpu_cap_init_kvm_defined(CPUID_8000_0007_EDX,
+	kvm_cpu_cap_init(CPUID_8000_0007_EDX,
 		SF(CONSTANT_TSC)
 	);
 
@@ -835,7 +821,7 @@ void kvm_set_cpu_caps(void)
 	kvm_cpu_cap_check_and_set(X86_FEATURE_IBPB_BRTYPE);
 	kvm_cpu_cap_check_and_set(X86_FEATURE_SRSO_NO);
 
-	kvm_cpu_cap_init_kvm_defined(CPUID_8000_0022_EAX,
+	kvm_cpu_cap_init(CPUID_8000_0022_EAX,
 		F(PERFMON_V2)
 	);
 
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 24/49] KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (22 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 23/49] KVM: x86: Handle kernel- and KVM-defined CPUID words in a single helper Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  1:30   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 25/49] KVM: x86: Harden CPU capabilities processing against out-of-scope features Sean Christopherson
                   ` (25 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Undefine SPEC_CTRL_SSBD, which is #defined by msr-index.h to represent the
enable flag in MSR_IA32_SPEC_CTRL, to avoid issues with the macro being
unpacked into its raw value when passed to KVM's F() macro.  This will
allow using multiple layers of macros in F() and friends, e.g. to harden
against incorrect usage of F().

No functional change intended (cpuid.c doesn't consume SPEC_CTRL_SSBD).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 8efffd48cdf1..a16d6e070c11 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -639,6 +639,12 @@ static __always_inline void kvm_cpu_cap_init(u32 leaf, u32 mask)
 	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
 }
 
+/*
+ * Undefine the MSR bit macro to avoid token concatenation issues when
+ * processing X86_FEATURE_SPEC_CTRL_SSBD.
+ */
+#undef SPEC_CTRL_SSBD
+
 void kvm_set_cpu_caps(void)
 {
 	memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 25/49] KVM: x86: Harden CPU capabilities processing against out-of-scope features
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (23 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 24/49] KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  1:31   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software Sean Christopherson
                   ` (24 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Add compile-time assertions to verify that usage of F() and friends in
kvm_set_cpu_caps() is scoped to the correct CPUID word, e.g. to detect
bugs where KVM passes a feature bit from word X into word y.

Add a one-off assertion in the aliased feature macro to ensure that only
word 0x8000_0001.EDX aliased the features defined for 0x1.EDX.

To do so, convert kvm_cpu_cap_init() to a macro and have it define a
local variable to track which CPUID word is being initialized that is
then used to validate usage of F() (all of the inputs are compile-time
constants and thus can be fed into BUILD_BUG_ON()).

Redefine KVM_VALIDATE_CPU_CAP_USAGE after kvm_set_cpu_caps() to be a nop
so that F() can be used in other flows that aren't as easily hardened,
e.g. __do_cpuid_func_emulated() and __do_cpuid_func().

Invoke KVM_VALIDATE_CPU_CAP_USAGE() in SF() and X86_64_F() to ensure the
validation occurs, e.g. if the usage of F() is completely compiled out
(which shouldn't happen for boot_cpu_has(), but could happen in the future,
e.g. if KVM were to use cpu_feature_enabled()).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 55 +++++++++++++++++++++++++++++++-------------
 1 file changed, 39 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a16d6e070c11..1064e4d68718 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -61,18 +61,24 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
 	return ret;
 }
 
-#define F feature_bit
+#define F(name)							\
+({								\
+	KVM_VALIDATE_CPU_CAP_USAGE(name);			\
+	feature_bit(name);					\
+})
 
 /* Scattered Flag - For features that are scattered by cpufeatures.h. */
 #define SF(name)						\
 ({								\
 	BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES);	\
+	KVM_VALIDATE_CPU_CAP_USAGE(name);			\
 	(boot_cpu_has(X86_FEATURE_##name) ? F(name) : 0);	\
 })
 
 /* Features that KVM supports only on 64-bit kernels. */
 #define X86_64_F(name)						\
 ({								\
+	KVM_VALIDATE_CPU_CAP_USAGE(name);			\
 	(IS_ENABLED(CONFIG_X86_64) ? F(name) : 0);		\
 })
 
@@ -95,6 +101,7 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
 #define AF(name)								\
 ({										\
 	BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);	\
+	BUILD_BUG_ON(kvm_cpu_cap_init_in_progress != CPUID_8000_0001_EDX);	\
 	feature_bit(name);							\
 })
 
@@ -622,22 +629,34 @@ static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid)
 	return *__cpuid_entry_get_reg(&entry, cpuid.reg);
 }
 
-static __always_inline void kvm_cpu_cap_init(u32 leaf, u32 mask)
-{
-	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
+/*
+ * Assert that the feature bit being declared, e.g. via F(), is in the CPUID
+ * word that's being initialized.  Exempt 0x8000_0001.EDX usage of 0x1.EDX
+ * features, as AMD duplicated many 0x1.EDX features into 0x8000_0001.EDX.
+ */
+#define KVM_VALIDATE_CPU_CAP_USAGE(name)				\
+do {									\
+	u32 __leaf = __feature_leaf(X86_FEATURE_##name);		\
+									\
+	BUILD_BUG_ON(__leaf != kvm_cpu_cap_init_in_progress);		\
+} while (0)
 
-	/*
-	 * For kernel-defined leafs, mask the boot CPU's pre-populated value.
-	 * For KVM-defined leafs, explicitly set the leaf, as KVM is the one
-	 * and only authority.
-	 */
-	if (leaf < NCAPINTS)
-		kvm_cpu_caps[leaf] &= mask;
-	else
-		kvm_cpu_caps[leaf] = mask;
-
-	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
-}
+/*
+ * For kernel-defined leafs, mask the boot CPU's pre-populated value.  For KVM-
+ * defined leafs, explicitly set the leaf, as KVM is the one and only authority.
+ */
+#define kvm_cpu_cap_init(leaf, mask)					\
+do {									\
+	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
+	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
+									\
+	if (leaf < NCAPINTS)						\
+		kvm_cpu_caps[leaf] &= (mask);				\
+	else								\
+		kvm_cpu_caps[leaf] = (mask);				\
+									\
+	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);			\
+} while (0)
 
 /*
  * Undefine the MSR bit macro to avoid token concatenation issues when
@@ -870,6 +889,10 @@ void kvm_set_cpu_caps(void)
 }
 EXPORT_SYMBOL_GPL(kvm_set_cpu_caps);
 
+#undef kvm_cpu_cap_init
+#undef KVM_VALIDATE_CPU_CAP_USAGE
+#define KVM_VALIDATE_CPU_CAP_USAGE(name)
+
 struct kvm_cpuid_array {
 	struct kvm_cpuid_entry2 *entries;
 	int maxnent;
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (24 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 25/49] KVM: x86: Harden CPU capabilities processing against out-of-scope features Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  1:59   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 27/49] KVM: x86: Swap incoming guest CPUID into vCPU before massaging in KVM_SET_CPUID2 Sean Christopherson
                   ` (23 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Now that kvm_cpu_cap_init() is a macro with its own scope, add EMUL_F() to
OR-in features that KVM emulates in software, i.e. that don't depend on
the feature being available in hardware.  The contained scope
of kvm_cpu_cap_init() allows using a local variable to track the set of
emulated leaves, which in addition to avoiding confusing and/or
unnecessary variables, helps prevent misuse of EMUL_F().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 36 +++++++++++++++++++++---------------
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1064e4d68718..33e3e77de1b7 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -94,6 +94,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
 	F(name);						\
 })
 
+/*
+ * Emulated Feature - For features that KVM emulates in software irrespective
+ * of host CPU/kernel support.
+ */
+#define EMUL_F(name)						\
+({								\
+	kvm_cpu_cap_emulated |= F(name);			\
+	F(name);						\
+})
+
 /*
  * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
  * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
@@ -649,6 +659,7 @@ do {									\
 do {									\
 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
 	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
+	u32 kvm_cpu_cap_emulated = 0;					\
 									\
 	if (leaf < NCAPINTS)						\
 		kvm_cpu_caps[leaf] &= (mask);				\
@@ -656,6 +667,7 @@ do {									\
 		kvm_cpu_caps[leaf] = (mask);				\
 									\
 	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);			\
+	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
 } while (0)
 
 /*
@@ -684,12 +696,10 @@ void kvm_set_cpu_caps(void)
 		0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
 		F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
 		F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
-		F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
+		F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
 		0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
 		F(F16C) | F(RDRAND)
 	);
-	/* KVM emulates x2apic in software irrespective of host support. */
-	kvm_cpu_cap_set(X86_FEATURE_X2APIC);
 
 	kvm_cpu_cap_init(CPUID_1_EDX,
 		F(FPU) | F(VME) | F(DE) | F(PSE) |
@@ -703,13 +713,13 @@ void kvm_set_cpu_caps(void)
 	);
 
 	kvm_cpu_cap_init(CPUID_7_0_EBX,
-		F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) |
-		F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) | F(INVPCID) |
-		F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ | F(AVX512F) |
-		F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) | F(AVX512IFMA) |
-		F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ | F(AVX512PF) |
-		F(AVX512ER) | F(AVX512CD) | F(SHA_NI) | F(AVX512BW) |
-		F(AVX512VL));
+		F(FSGSBASE) | EMUL_F(TSC_ADJUST) | F(SGX) | F(BMI1) | F(HLE) |
+		F(AVX2) | F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) |
+		F(INVPCID) | F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ |
+		F(AVX512F) | F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) |
+		F(AVX512IFMA) | F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ |
+		F(AVX512PF) | F(AVX512ER) | F(AVX512CD) | F(SHA_NI) |
+		F(AVX512BW) | F(AVX512VL));
 
 	kvm_cpu_cap_init(CPUID_7_ECX,
 		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
@@ -728,16 +738,12 @@ void kvm_set_cpu_caps(void)
 
 	kvm_cpu_cap_init(CPUID_7_EDX,
 		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
-		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
+		F(SPEC_CTRL_SSBD) | EMUL_F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
 		F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
 		F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) |
 		F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D)
 	);
 
-	/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
-	kvm_cpu_cap_set(X86_FEATURE_TSC_ADJUST);
-	kvm_cpu_cap_set(X86_FEATURE_ARCH_CAPABILITIES);
-
 	if (boot_cpu_has(X86_FEATURE_IBPB) && boot_cpu_has(X86_FEATURE_IBRS))
 		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL);
 	if (boot_cpu_has(X86_FEATURE_STIBP))
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 27/49] KVM: x86: Swap incoming guest CPUID into vCPU before massaging in KVM_SET_CPUID2
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (25 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  1:32   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 28/49] KVM: x86: Clear PV_UNHALT for !HLT-exiting only when userspace sets CPUID Sean Christopherson
                   ` (22 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

When handling KVM_SET_CPUID{,2}, swap the old and new CPUID arrays and
lengths before processing the new CPUID, and simply undo the swap if
setting the new CPUID fails for whatever reason.

To keep the diff reasonable, continue passing the entry array and length
to most helpers, and defer the more complete cleanup to future commits.

For any sane VMM, setting "bad" CPUID state is not a hot path (or even
something that is surviable), and setting guest CPUID before it's known
good will allow removing all of KVM's infrastructure for processing CPUID
entries directly (as opposed to operating on vcpu->arch.cpuid_entries).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 49 +++++++++++++++++++++++++++-----------------
 1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 33e3e77de1b7..4ad01867cb8d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -175,10 +175,10 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
 	return NULL;
 }
 
-static int kvm_check_cpuid(struct kvm_vcpu *vcpu,
-			   struct kvm_cpuid_entry2 *entries,
-			   int nent)
+static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
 {
+	struct kvm_cpuid_entry2 *entries = vcpu->arch.cpuid_entries;
+	int nent = vcpu->arch.cpuid_nent;
 	struct kvm_cpuid_entry2 *best;
 	u64 xfeatures;
 
@@ -369,9 +369,11 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);
 
-static bool kvm_cpuid_has_hyperv(struct kvm_cpuid_entry2 *entries, int nent)
+static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_KVM_HYPERV
+	struct kvm_cpuid_entry2 *entries = vcpu->arch.cpuid_entries;
+	int nent = vcpu->arch.cpuid_nent;
 	struct kvm_cpuid_entry2 *entry;
 
 	entry = cpuid_entry2_find(entries, nent, HYPERV_CPUID_INTERFACE,
@@ -436,8 +438,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 					 __cr4_reserved_bits(guest_cpuid_has, vcpu);
 #undef __kvm_cpu_cap_has
 
-	kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu->arch.cpuid_entries,
-						    vcpu->arch.cpuid_nent));
+	kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu));
 
 	/* Invoke the vendor callback only after the above state is updated. */
 	static_call(kvm_x86_vcpu_after_set_cpuid)(vcpu);
@@ -478,6 +479,15 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 {
 	int r;
 
+	/*
+	 * Swap the existing (old) entries with the incoming (new) entries in
+	 * order to massage the new entries, e.g. to account for dynamic bits
+	 * that KVM controls, without clobbering the current guest CPUID, which
+	 * KVM needs to preserve in order to unwind on failure.
+	 */
+	swap(vcpu->arch.cpuid_entries, e2);
+	swap(vcpu->arch.cpuid_nent, nent);
+
 	/*
 	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
 	 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
@@ -497,31 +507,25 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 		 * only because any change in CPUID is disallowed, i.e. using
 		 * stale data is ok because KVM will reject the change.
 		 */
-		__kvm_update_cpuid_runtime(vcpu, e2, nent);
+		kvm_update_cpuid_runtime(vcpu);
 
 		r = kvm_cpuid_check_equal(vcpu, e2, nent);
 		if (r)
-			return r;
-
-		kvfree(e2);
-		return 0;
+			goto err;
+		goto success;
 	}
 
 #ifdef CONFIG_KVM_HYPERV
-	if (kvm_cpuid_has_hyperv(e2, nent)) {
+	if (kvm_cpuid_has_hyperv(vcpu)) {
 		r = kvm_hv_vcpu_init(vcpu);
 		if (r)
-			return r;
+			goto err;
 	}
 #endif
 
-	r = kvm_check_cpuid(vcpu, e2, nent);
+	r = kvm_check_cpuid(vcpu);
 	if (r)
-		return r;
-
-	kvfree(vcpu->arch.cpuid_entries);
-	vcpu->arch.cpuid_entries = e2;
-	vcpu->arch.cpuid_nent = nent;
+		goto err;
 
 	vcpu->arch.kvm_cpuid = kvm_get_hypervisor_cpuid(vcpu, KVM_SIGNATURE);
 #ifdef CONFIG_KVM_XEN
@@ -529,7 +533,14 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 #endif
 	kvm_vcpu_after_set_cpuid(vcpu);
 
+success:
+	kvfree(e2);
 	return 0;
+
+err:
+	swap(vcpu->arch.cpuid_entries, e2);
+	swap(vcpu->arch.cpuid_nent, nent);
+	return r;
 }
 
 /* when an old userspace process fills a new kernel module */
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 28/49] KVM: x86: Clear PV_UNHALT for !HLT-exiting only when userspace sets CPUID
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (26 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 27/49] KVM: x86: Swap incoming guest CPUID into vCPU before massaging in KVM_SET_CPUID2 Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  1:32   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 29/49] KVM: x86: Remove unnecessary caching of KVM's PV CPUID base Sean Christopherson
                   ` (21 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Now that KVM disallows disabling HLT-exiting after vCPUs have been created,
i.e. now that it's impossible for kvm_hlt_in_guest() to change while vCPUs
are running, apply KVM's PV_UNHALT quirk only when userspace is setting
guest CPUID.

Opportunistically rename the helper to make it clear that KVM's behavior
is a quirk that should never have been added.  KVM's documentation
explicitly states that userspace should not advertise PV_UNHALT if
HLT-exiting is disabled, but for unknown reasons, commit caa057a2cad6
("KVM: X86: Provide a capability to disable HLT intercepts") didn't stop
at documenting the requirement and also massaged the incoming guest CPUID.

Unfortunately, it's quite likely that userspace has come to rely on KVM's
behavior, i.e. the code can't simply be deleted.  The only reason KVM
doesn't have an "official" quirk is that there is no known use case where
disabling the quirk would make sense, i.e. letting userspace disable the
quirk would further increase KVM's burden without any benefit.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 4ad01867cb8d..93a7399dc0db 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -287,18 +287,17 @@ static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcp
 					     vcpu->arch.cpuid_nent, base);
 }
 
-static void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
+static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best = kvm_find_kvm_cpuid_features(vcpu);
 
-	vcpu->arch.pv_cpuid.features = 0;
+	if (!best)
+		return 0;
 
-	/*
-	 * save the feature bitmap to avoid cpuid lookup for every PV
-	 * operation
-	 */
-	if (best)
-		vcpu->arch.pv_cpuid.features = best->eax;
+	if (kvm_hlt_in_guest(vcpu->kvm))
+		best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
+
+	return best->eax;
 }
 
 /*
@@ -320,7 +319,6 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
 				       int nent)
 {
 	struct kvm_cpuid_entry2 *best;
-	struct kvm_hypervisor_cpuid kvm_cpuid;
 
 	best = cpuid_entry2_find(entries, nent, 1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
 	if (best) {
@@ -347,13 +345,6 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
 		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
 		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
 
-	kvm_cpuid = __kvm_get_hypervisor_cpuid(entries, nent, KVM_SIGNATURE);
-	if (kvm_cpuid.base) {
-		best = __kvm_find_kvm_cpuid_features(entries, nent, kvm_cpuid.base);
-		if (kvm_hlt_in_guest(vcpu->kvm) && best)
-			best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
-	}
-
 	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
 		best = cpuid_entry2_find(entries, nent, 0x1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
 		if (best)
@@ -425,7 +416,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	vcpu->arch.guest_supported_xcr0 =
 		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
 
-	kvm_update_pv_runtime(vcpu);
+	vcpu->arch.pv_cpuid.features = kvm_apply_cpuid_pv_features_quirk(vcpu);
 
 	vcpu->arch.is_amd_compatible = guest_cpuid_is_amd_or_hygon(vcpu);
 	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
@@ -508,6 +499,7 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 		 * stale data is ok because KVM will reject the change.
 		 */
 		kvm_update_cpuid_runtime(vcpu);
+		kvm_apply_cpuid_pv_features_quirk(vcpu);
 
 		r = kvm_cpuid_check_equal(vcpu, e2, nent);
 		if (r)
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 29/49] KVM: x86: Remove unnecessary caching of KVM's PV CPUID base
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (27 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 28/49] KVM: x86: Clear PV_UNHALT for !HLT-exiting only when userspace sets CPUID Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  1:51   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 30/49] KVM: x86: Always operate on kvm_vcpu data in cpuid_entry2_find() Sean Christopherson
                   ` (20 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Now that KVM only searches for KVM's PV CPUID base when userspace sets
guest CPUID, drop the cache and simply do the search every time.

Practically speaking, this is a nop except for situations where userspace
sets CPUID _after_ running the vCPU, which is anything but a hot path,
e.g. QEMU does so only when hotplugging a vCPU.  And on the flip side,
caching guest CPUID information, especially information that is used to
query/modify _other_ CPUID state, is inherently dangerous as it's all too
easy to use stale information, i.e. KVM should only cache CPUID state when
the performance and/or programming benefits justify it.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/cpuid.c            | 34 +++++++--------------------------
 2 files changed, 7 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aabf1648a56a..3003e99155e7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -858,7 +858,6 @@ struct kvm_vcpu_arch {
 
 	int cpuid_nent;
 	struct kvm_cpuid_entry2 *cpuid_entries;
-	struct kvm_hypervisor_cpuid kvm_cpuid;
 	bool is_amd_compatible;
 
 	/*
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 93a7399dc0db..7290f91c422c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -269,28 +269,16 @@ static struct kvm_hypervisor_cpuid kvm_get_hypervisor_cpuid(struct kvm_vcpu *vcp
 					  vcpu->arch.cpuid_nent, sig);
 }
 
-static struct kvm_cpuid_entry2 *__kvm_find_kvm_cpuid_features(struct kvm_cpuid_entry2 *entries,
-							      int nent, u32 kvm_cpuid_base)
-{
-	return cpuid_entry2_find(entries, nent, kvm_cpuid_base | KVM_CPUID_FEATURES,
-				 KVM_CPUID_INDEX_NOT_SIGNIFICANT);
-}
-
-static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcpu)
-{
-	u32 base = vcpu->arch.kvm_cpuid.base;
-
-	if (!base)
-		return NULL;
-
-	return __kvm_find_kvm_cpuid_features(vcpu->arch.cpuid_entries,
-					     vcpu->arch.cpuid_nent, base);
-}
-
 static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu)
 {
-	struct kvm_cpuid_entry2 *best = kvm_find_kvm_cpuid_features(vcpu);
+	struct kvm_hypervisor_cpuid kvm_cpuid;
+	struct kvm_cpuid_entry2 *best;
 
+	kvm_cpuid = kvm_get_hypervisor_cpuid(vcpu, KVM_SIGNATURE);
+	if (!kvm_cpuid.base)
+		return 0;
+
+	best = kvm_find_cpuid_entry(vcpu, kvm_cpuid.base | KVM_CPUID_FEATURES);
 	if (!best)
 		return 0;
 
@@ -491,13 +479,6 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 	 * whether the supplied CPUID data is equal to what's already set.
 	 */
 	if (kvm_vcpu_has_run(vcpu)) {
-		/*
-		 * Note, runtime CPUID updates may consume other CPUID-driven
-		 * vCPU state, e.g. KVM or Xen CPUID bases.  Updating runtime
-		 * state before full CPUID processing is functionally correct
-		 * only because any change in CPUID is disallowed, i.e. using
-		 * stale data is ok because KVM will reject the change.
-		 */
 		kvm_update_cpuid_runtime(vcpu);
 		kvm_apply_cpuid_pv_features_quirk(vcpu);
 
@@ -519,7 +500,6 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 	if (r)
 		goto err;
 
-	vcpu->arch.kvm_cpuid = kvm_get_hypervisor_cpuid(vcpu, KVM_SIGNATURE);
 #ifdef CONFIG_KVM_XEN
 	vcpu->arch.xen.cpuid = kvm_get_hypervisor_cpuid(vcpu, XEN_SIGNATURE);
 #endif
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 30/49] KVM: x86: Always operate on kvm_vcpu data in cpuid_entry2_find()
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (28 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 29/49] KVM: x86: Remove unnecessary caching of KVM's PV CPUID base Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  1:51   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 31/49] KVM: x86: Move kvm_find_cpuid_entry{,_index}() up near cpuid_entry2_find() Sean Christopherson
                   ` (19 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Now that KVM sets vcpu->arch.cpuid_{entries,nent} before processing the
incoming CPUID entries during KVM_SET_CPUID{,2}, drop the @entries and
@nent params from cpuid_entry2_find() and unconditionally operate on the
vCPU state.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 62 +++++++++++++++-----------------------------
 1 file changed, 21 insertions(+), 41 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7290f91c422c..0526f25a7c80 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -124,8 +124,8 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
  */
 #define KVM_CPUID_INDEX_NOT_SIGNIFICANT -1ull
 
-static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
-	struct kvm_cpuid_entry2 *entries, int nent, u32 function, u64 index)
+static struct kvm_cpuid_entry2 *cpuid_entry2_find(struct kvm_vcpu *vcpu,
+						  u32 function, u64 index)
 {
 	struct kvm_cpuid_entry2 *e;
 	int i;
@@ -142,8 +142,8 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
 	 */
 	lockdep_assert_irqs_enabled();
 
-	for (i = 0; i < nent; i++) {
-		e = &entries[i];
+	for (i = 0; i < vcpu->arch.cpuid_nent; i++) {
+		e = &vcpu->arch.cpuid_entries[i];
 
 		if (e->function != function)
 			continue;
@@ -177,8 +177,6 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
 
 static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
 {
-	struct kvm_cpuid_entry2 *entries = vcpu->arch.cpuid_entries;
-	int nent = vcpu->arch.cpuid_nent;
 	struct kvm_cpuid_entry2 *best;
 	u64 xfeatures;
 
@@ -186,7 +184,7 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
 	 * The existing code assumes virtual address is 48-bit or 57-bit in the
 	 * canonical address checks; exit if it is ever changed.
 	 */
-	best = cpuid_entry2_find(entries, nent, 0x80000008,
+	best = cpuid_entry2_find(vcpu, 0x80000008,
 				 KVM_CPUID_INDEX_NOT_SIGNIFICANT);
 	if (best) {
 		int vaddr_bits = (best->eax & 0xff00) >> 8;
@@ -199,7 +197,7 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
 	 * Exposing dynamic xfeatures to the guest requires additional
 	 * enabling in the FPU, e.g. to expand the guest XSAVE state size.
 	 */
-	best = cpuid_entry2_find(entries, nent, 0xd, 0);
+	best = cpuid_entry2_find(vcpu, 0xd, 0);
 	if (!best)
 		return 0;
 
@@ -234,15 +232,15 @@ static int kvm_cpuid_check_equal(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2
 	return 0;
 }
 
-static struct kvm_hypervisor_cpuid __kvm_get_hypervisor_cpuid(struct kvm_cpuid_entry2 *entries,
-							      int nent, const char *sig)
+static struct kvm_hypervisor_cpuid kvm_get_hypervisor_cpuid(struct kvm_vcpu *vcpu,
+							    const char *sig)
 {
 	struct kvm_hypervisor_cpuid cpuid = {};
 	struct kvm_cpuid_entry2 *entry;
 	u32 base;
 
 	for_each_possible_hypervisor_cpuid_base(base) {
-		entry = cpuid_entry2_find(entries, nent, base, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+		entry = cpuid_entry2_find(vcpu, base, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
 
 		if (entry) {
 			u32 signature[3];
@@ -262,13 +260,6 @@ static struct kvm_hypervisor_cpuid __kvm_get_hypervisor_cpuid(struct kvm_cpuid_e
 	return cpuid;
 }
 
-static struct kvm_hypervisor_cpuid kvm_get_hypervisor_cpuid(struct kvm_vcpu *vcpu,
-							    const char *sig)
-{
-	return __kvm_get_hypervisor_cpuid(vcpu->arch.cpuid_entries,
-					  vcpu->arch.cpuid_nent, sig);
-}
-
 static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu)
 {
 	struct kvm_hypervisor_cpuid kvm_cpuid;
@@ -292,23 +283,22 @@ static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu)
  * Calculate guest's supported XCR0 taking into account guest CPUID data and
  * KVM's supported XCR0 (comprised of host's XCR0 and KVM_SUPPORTED_XCR0).
  */
-static u64 cpuid_get_supported_xcr0(struct kvm_cpuid_entry2 *entries, int nent)
+static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
 
-	best = cpuid_entry2_find(entries, nent, 0xd, 0);
+	best = cpuid_entry2_find(vcpu, 0xd, 0);
 	if (!best)
 		return 0;
 
 	return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
 }
 
-static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries,
-				       int nent)
+void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
 
-	best = cpuid_entry2_find(entries, nent, 1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+	best = cpuid_entry2_find(vcpu, 1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
 	if (best) {
 		/* Update OSXSAVE bit */
 		if (boot_cpu_has(X86_FEATURE_XSAVE))
@@ -319,43 +309,36 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
 			   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
 	}
 
-	best = cpuid_entry2_find(entries, nent, 7, 0);
+	best = cpuid_entry2_find(vcpu, 7, 0);
 	if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7)
 		cpuid_entry_change(best, X86_FEATURE_OSPKE,
 				   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
 
-	best = cpuid_entry2_find(entries, nent, 0xD, 0);
+	best = cpuid_entry2_find(vcpu, 0xD, 0);
 	if (best)
 		best->ebx = xstate_required_size(vcpu->arch.xcr0, false);
 
-	best = cpuid_entry2_find(entries, nent, 0xD, 1);
+	best = cpuid_entry2_find(vcpu, 0xD, 1);
 	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
 		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
 		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
 
 	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
-		best = cpuid_entry2_find(entries, nent, 0x1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+		best = cpuid_entry2_find(vcpu, 0x1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
 		if (best)
 			cpuid_entry_change(best, X86_FEATURE_MWAIT,
 					   vcpu->arch.ia32_misc_enable_msr &
 					   MSR_IA32_MISC_ENABLE_MWAIT);
 	}
 }
-
-void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
-{
-	__kvm_update_cpuid_runtime(vcpu, vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
-}
 EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);
 
 static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_KVM_HYPERV
-	struct kvm_cpuid_entry2 *entries = vcpu->arch.cpuid_entries;
-	int nent = vcpu->arch.cpuid_nent;
 	struct kvm_cpuid_entry2 *entry;
 
-	entry = cpuid_entry2_find(entries, nent, HYPERV_CPUID_INTERFACE,
+	entry = cpuid_entry2_find(vcpu, HYPERV_CPUID_INTERFACE,
 				  KVM_CPUID_INDEX_NOT_SIGNIFICANT);
 	return entry && entry->eax == HYPERV_CPUID_SIGNATURE_EAX;
 #else
@@ -401,8 +384,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		kvm_apic_set_version(vcpu);
 	}
 
-	vcpu->arch.guest_supported_xcr0 =
-		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
+	vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu);
 
 	vcpu->arch.pv_cpuid.features = kvm_apply_cpuid_pv_features_quirk(vcpu);
 
@@ -1532,16 +1514,14 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
 						    u32 function, u32 index)
 {
-	return cpuid_entry2_find(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent,
-				 function, index);
+	return cpuid_entry2_find(vcpu, function, index);
 }
 EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry_index);
 
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
 					      u32 function)
 {
-	return cpuid_entry2_find(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent,
-				 function, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+	return cpuid_entry2_find(vcpu, function, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
 }
 EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);
 
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 31/49] KVM: x86: Move kvm_find_cpuid_entry{,_index}() up near cpuid_entry2_find()
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (29 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 30/49] KVM: x86: Always operate on kvm_vcpu data in cpuid_entry2_find() Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  1:51   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 32/49] KVM: x86: Remove all direct usage of cpuid_entry2_find() Sean Christopherson
                   ` (18 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Move kvm_find_cpuid_entry{,_index}() "up" in cpuid.c so that they are
colocated with cpuid_entry2_find(), e.g. to make it easier to see the
effective guts of the helpers without having to bounce around cpuid.c.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0526f25a7c80..d7390ade1c29 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -175,6 +175,20 @@ static struct kvm_cpuid_entry2 *cpuid_entry2_find(struct kvm_vcpu *vcpu,
 	return NULL;
 }
 
+struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
+						    u32 function, u32 index)
+{
+	return cpuid_entry2_find(vcpu, function, index);
+}
+EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry_index);
+
+struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
+					      u32 function)
+{
+	return cpuid_entry2_find(vcpu, function, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+}
+EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);
+
 static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
@@ -1511,20 +1525,6 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
 	return r;
 }
 
-struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
-						    u32 function, u32 index)
-{
-	return cpuid_entry2_find(vcpu, function, index);
-}
-EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry_index);
-
-struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
-					      u32 function)
-{
-	return cpuid_entry2_find(vcpu, function, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
-}
-EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);
-
 /*
  * Intel CPUID semantics treats any query for an out-of-range leaf as if the
  * highest basic leaf (i.e. CPUID.0H:EAX) were requested.  AMD CPUID semantics
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 32/49] KVM: x86: Remove all direct usage of cpuid_entry2_find()
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (30 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 31/49] KVM: x86: Move kvm_find_cpuid_entry{,_index}() up near cpuid_entry2_find() Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  1:52   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 33/49] KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID Sean Christopherson
                   ` (17 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Convert all use of cpuid_entry2_find() to kvm_find_cpuid_entry{,index}()
now that cpuid_entry2_find() operates on the vCPU state, i.e. now that
there is no need to use cpuid_entry2_find() directly in order to pass in
non-vCPU state.

To help prevent unwanted usage of cpuid_entry2_find(), #undef
KVM_CPUID_INDEX_NOT_SIGNIFICANT, i.e. force KVM to use
kvm_find_cpuid_entry().

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index d7390ade1c29..699ce4261e9c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -189,6 +189,12 @@ struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
 }
 EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);
 
+/*
+ * cpuid_entry2_find() and KVM_CPUID_INDEX_NOT_SIGNIFICANT should never be used
+ * directly outside of kvm_find_cpuid_entry() and kvm_find_cpuid_entry_index().
+ */
+#undef KVM_CPUID_INDEX_NOT_SIGNIFICANT
+
 static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
@@ -198,8 +204,7 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
 	 * The existing code assumes virtual address is 48-bit or 57-bit in the
 	 * canonical address checks; exit if it is ever changed.
 	 */
-	best = cpuid_entry2_find(vcpu, 0x80000008,
-				 KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+	best = kvm_find_cpuid_entry(vcpu, 0x80000008);
 	if (best) {
 		int vaddr_bits = (best->eax & 0xff00) >> 8;
 
@@ -211,7 +216,7 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
 	 * Exposing dynamic xfeatures to the guest requires additional
 	 * enabling in the FPU, e.g. to expand the guest XSAVE state size.
 	 */
-	best = cpuid_entry2_find(vcpu, 0xd, 0);
+	best = kvm_find_cpuid_entry_index(vcpu, 0xd, 0);
 	if (!best)
 		return 0;
 
@@ -254,7 +259,7 @@ static struct kvm_hypervisor_cpuid kvm_get_hypervisor_cpuid(struct kvm_vcpu *vcp
 	u32 base;
 
 	for_each_possible_hypervisor_cpuid_base(base) {
-		entry = cpuid_entry2_find(vcpu, base, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+		entry = kvm_find_cpuid_entry(vcpu, base);
 
 		if (entry) {
 			u32 signature[3];
@@ -301,7 +306,7 @@ static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
 
-	best = cpuid_entry2_find(vcpu, 0xd, 0);
+	best = kvm_find_cpuid_entry_index(vcpu, 0xd, 0);
 	if (!best)
 		return 0;
 
@@ -312,7 +317,7 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
 
-	best = cpuid_entry2_find(vcpu, 1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+	best = kvm_find_cpuid_entry(vcpu, 1);
 	if (best) {
 		/* Update OSXSAVE bit */
 		if (boot_cpu_has(X86_FEATURE_XSAVE))
@@ -323,22 +328,22 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 			   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
 	}
 
-	best = cpuid_entry2_find(vcpu, 7, 0);
+	best = kvm_find_cpuid_entry_index(vcpu, 7, 0);
 	if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7)
 		cpuid_entry_change(best, X86_FEATURE_OSPKE,
 				   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
 
-	best = cpuid_entry2_find(vcpu, 0xD, 0);
+	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 0);
 	if (best)
 		best->ebx = xstate_required_size(vcpu->arch.xcr0, false);
 
-	best = cpuid_entry2_find(vcpu, 0xD, 1);
+	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 1);
 	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
 		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
 		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
 
 	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
-		best = cpuid_entry2_find(vcpu, 0x1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+		best = kvm_find_cpuid_entry(vcpu, 0x1);
 		if (best)
 			cpuid_entry_change(best, X86_FEATURE_MWAIT,
 					   vcpu->arch.ia32_misc_enable_msr &
@@ -352,8 +357,7 @@ static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_KVM_HYPERV
 	struct kvm_cpuid_entry2 *entry;
 
-	entry = cpuid_entry2_find(vcpu, HYPERV_CPUID_INTERFACE,
-				  KVM_CPUID_INDEX_NOT_SIGNIFICANT);
+	entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_INTERFACE);
 	return entry && entry->eax == HYPERV_CPUID_SIGNATURE_EAX;
 #else
 	return false;
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 33/49] KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (31 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 32/49] KVM: x86: Remove all direct usage of cpuid_entry2_find() Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-05-22  9:11   ` Binbin Wu
  2024-07-05  2:04   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 34/49] KVM: x86: Advertise HYPERVISOR " Sean Christopherson
                   ` (16 subsequent siblings)
  49 siblings, 2 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Advertise TSC_DEADLINE_TIMER via KVM_GET_SUPPORTED_CPUID when it's
supported in hardware, as the odds of a VMM emulating the local APIC in
userspace, not emulating the TSC deadline timer, _and_ reflecting
KVM_GET_SUPPORTED_CPUID back into KVM_SET_CPUID2 are extremely low.

KVM has _unconditionally_ advertised X2APIC via CPUID since commit
0d1de2d901f4 ("KVM: Always report x2apic as supported feature"), and it
is completely impossible for userspace to emulate X2APIC as KVM doesn't
support forwarding the MSR accesses to userspace.  I.e. KVM has relied on
userspace VMMs to not misreport local APIC capabilities for nearly 13
years.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/api.rst | 9 ++++++---
 arch/x86/kvm/cpuid.c           | 4 ++--
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 884846282d06..cb744a646de6 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1804,15 +1804,18 @@ emulate them efficiently. The fields in each entry are defined as follows:
          the values returned by the cpuid instruction for
          this function/index combination
 
-The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned
-as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC
-support.  Instead it is reported via::
+x2APIC (CPUID leaf 1, ecx[21) and TSC deadline timer (CPUID leaf 1, ecx[24])
+may be returned as true, but they depend on KVM_CREATE_IRQCHIP for in-kernel
+emulation of the local APIC.  TSC deadline timer support is also reported via::
 
   ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)
 
 if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
 feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
 
+Enabling x2APIC in KVM_SET_CPUID2 requires KVM_CREATE_IRQCHIP as KVM doesn't
+support forwarding x2APIC MSR accesses to userspace, i.e. KVM does not support
+emulating x2APIC in userspace.
 
 4.47 KVM_PPC_GET_PVINFO
 -----------------------
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 699ce4261e9c..d1f427284ccc 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -680,8 +680,8 @@ void kvm_set_cpu_caps(void)
 		F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
 		F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
 		F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
-		0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
-		F(F16C) | F(RDRAND)
+		EMUL_F(TSC_DEADLINE_TIMER) | F(AES) | F(XSAVE) |
+		0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND)
 	);
 
 	kvm_cpu_cap_init(CPUID_1_EDX,
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 34/49] KVM: x86: Advertise HYPERVISOR in KVM_GET_SUPPORTED_CPUID
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (32 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 33/49] KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  2:04   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 35/49] KVM: x86: Add a macro to handle features that are fully VMM controlled Sean Christopherson
                   ` (15 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Unconditionally advertise "support" for the HYPERVISOR feature in CPUID,
as the flag simply communicates to the guest that's it's running under a
hypervisor.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index d1f427284ccc..de898d571faa 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -681,7 +681,8 @@ void kvm_set_cpu_caps(void)
 		F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
 		F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
 		EMUL_F(TSC_DEADLINE_TIMER) | F(AES) | F(XSAVE) |
-		0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND)
+		0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND) |
+		EMUL_F(HYPERVISOR)
 	);
 
 	kvm_cpu_cap_init(CPUID_1_EDX,
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 35/49] KVM: x86: Add a macro to handle features that are fully VMM controlled
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (33 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 34/49] KVM: x86: Advertise HYPERVISOR " Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  2:08   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 36/49] KVM: x86: Rename "governed features" helpers to use "guest_cpu_cap" Sean Christopherson
                   ` (14 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Add a macro to track CPUID features for which KVM fully defers to
userspace, i.e. that KVM honors if they are enumerated to the guest, even
if KVM itself doesn't advertise them to usersepace.

Somewhat unfortunately, this behavior only applies to MWAIT (largely
because of KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS), and it's not all that
likely future features will be handled in a similar way.  I.e. very
arguably, potentially tracking every feature in kvm_vmm_cpu_caps is a
waste of memory.

However, adding one-off handling for individual features is quite painful,
especially when considering future hardening.  It's very doable to verify,
at compile time, that every CPUID-based feature that KVM queries when
emulating guest behavior is actually known to KVM, e.g. to prevent KVM
bugs where KVM emulates some feature but fails to advertise support to
userspace.  In other words, any features that are special cased, i.e. not
handled generically in the CPUID framework, would also need to be special
cased for any hardening efforts that build on said framework.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index de898d571faa..16bb873188d6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -36,6 +36,8 @@
 u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
 EXPORT_SYMBOL_GPL(kvm_cpu_caps);
 
+static u32 kvm_vmm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
+
 u32 xstate_required_size(u64 xstate_bv, bool compacted)
 {
 	int feature_bit = 0;
@@ -115,6 +117,21 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
 	feature_bit(name);							\
 })
 
+/*
+ * VMM Features - For features that KVM "supports" in some capacity, i.e. that
+ * KVM may query, but that are never advertised to userspace.  E.g. KVM allows
+ * userspace to enumerate MONITOR+MWAIT support to the guest, but the MWAIT
+ * feature flag is never advertised to userspace because MONITOR+MWAIT aren't
+ * virtualized by hardware, can't be faithfully emulated in software (KVM
+ * emulates them as NOPs), and allowing the guest to execute them natively
+ * requires enabling a per-VM capability.
+ */
+#define VMM_F(name)								\
+({										\
+	kvm_vmm_cpu_caps[__feature_leaf(X86_FEATURE_##name)] |= F(name);	\
+	0;									\
+})
+
 /*
  * Magic value used by KVM when querying userspace-provided CPUID entries and
  * doesn't care about the CPIUD index because the index of the function in
@@ -674,7 +691,7 @@ void kvm_set_cpu_caps(void)
 		 * NOTE: MONITOR (and MWAIT) are emulated as NOP, but *not*
 		 * advertised to guests via CPUID!
 		 */
-		F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
+		F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64 */ | VMM_F(MWAIT) |
 		0 /* DS-CPL, VMX, SMX, EST */ |
 		0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
 		F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 36/49] KVM: x86: Rename "governed features" helpers to use "guest_cpu_cap"
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (34 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 35/49] KVM: x86: Add a macro to handle features that are fully VMM controlled Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-05-22 14:23   ` Binbin Wu
  2024-05-17 17:39 ` [PATCH v2 37/49] KVM: x86: Replace guts of "governed" features with comprehensive cpu_caps Sean Christopherson
                   ` (13 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

As the first step toward replacing KVM's so-called "governed features"
framework with a more comprehensive, less poorly named implementation,
replace the "kvm_governed_feature" function prefix with "guest_cpu_cap"
and rename guest_can_use() to guest_cpu_cap_has().

The "guest_cpu_cap" naming scheme mirrors that of "kvm_cpu_cap", and
provides a more clear distinction between guest capabilities, which are
KVM controlled (heh, or one might say "governed"), and guest CPUID, which
with few exceptions is fully userspace controlled.

Opportunistically rewrite the comment about XSS passthrough for SEV-ES
guests to avoid referencing so many functions, as such comments are prone
to becoming stale (case in point...).

No functional change intended.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c      |  2 +-
 arch/x86/kvm/cpuid.h      | 16 ++++++++--------
 arch/x86/kvm/mmu.h        |  2 +-
 arch/x86/kvm/mmu/mmu.c    |  4 ++--
 arch/x86/kvm/svm/nested.c | 22 +++++++++++-----------
 arch/x86/kvm/svm/sev.c    | 17 ++++++++---------
 arch/x86/kvm/svm/svm.c    | 26 +++++++++++++-------------
 arch/x86/kvm/svm/svm.h    |  4 ++--
 arch/x86/kvm/vmx/nested.c |  6 +++---
 arch/x86/kvm/vmx/vmx.c    | 16 ++++++++--------
 arch/x86/kvm/x86.c        |  4 ++--
 11 files changed, 59 insertions(+), 60 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 16bb873188d6..286abefc93d5 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -407,7 +407,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) :
 				      guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES);
 	if (allow_gbpages)
-		kvm_governed_feature_set(vcpu, X86_FEATURE_GBPAGES);
+		guest_cpu_cap_set(vcpu, X86_FEATURE_GBPAGES);
 
 	best = kvm_find_cpuid_entry(vcpu, 1);
 	if (best && apic) {
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index d68b7d879820..e021681f34ac 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -256,8 +256,8 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature)
 	return kvm_governed_feature_index(x86_feature) >= 0;
 }
 
-static __always_inline void kvm_governed_feature_set(struct kvm_vcpu *vcpu,
-						     unsigned int x86_feature)
+static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
+					      unsigned int x86_feature)
 {
 	BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
 
@@ -265,15 +265,15 @@ static __always_inline void kvm_governed_feature_set(struct kvm_vcpu *vcpu,
 		  vcpu->arch.governed_features.enabled);
 }
 
-static __always_inline void kvm_governed_feature_check_and_set(struct kvm_vcpu *vcpu,
-							       unsigned int x86_feature)
+static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
+							unsigned int x86_feature)
 {
 	if (kvm_cpu_cap_has(x86_feature) && guest_cpuid_has(vcpu, x86_feature))
-		kvm_governed_feature_set(vcpu, x86_feature);
+		guest_cpu_cap_set(vcpu, x86_feature);
 }
 
-static __always_inline bool guest_can_use(struct kvm_vcpu *vcpu,
-					  unsigned int x86_feature)
+static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
+					      unsigned int x86_feature)
 {
 	BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
 
@@ -283,7 +283,7 @@ static __always_inline bool guest_can_use(struct kvm_vcpu *vcpu,
 
 static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
-	if (guest_can_use(vcpu, X86_FEATURE_LAM))
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_LAM))
 		cr3 &= ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
 
 	return kvm_vcpu_is_legal_gpa(vcpu, cr3);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index dc80e72e4848..cf95ea5fe29d 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -150,7 +150,7 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
 
 static inline unsigned long kvm_get_active_cr3_lam_bits(struct kvm_vcpu *vcpu)
 {
-	if (!guest_can_use(vcpu, X86_FEATURE_LAM))
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_LAM))
 		return 0;
 
 	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5095fb46713e..e18a10c59431 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4966,7 +4966,7 @@ static void reset_guest_rsvds_bits_mask(struct kvm_vcpu *vcpu,
 	__reset_rsvds_bits_mask(&context->guest_rsvd_check,
 				vcpu->arch.reserved_gpa_bits,
 				context->cpu_role.base.level, is_efer_nx(context),
-				guest_can_use(vcpu, X86_FEATURE_GBPAGES),
+				guest_cpu_cap_has(vcpu, X86_FEATURE_GBPAGES),
 				is_cr4_pse(context),
 				guest_cpuid_is_amd_compatible(vcpu));
 }
@@ -5043,7 +5043,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
 	__reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
 				context->root_role.level,
 				context->root_role.efer_nx,
-				guest_can_use(vcpu, X86_FEATURE_GBPAGES),
+				guest_cpu_cap_has(vcpu, X86_FEATURE_GBPAGES),
 				is_pse, is_amd);
 
 	if (!shadow_me_mask)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 55b9a6d96bcf..2900a8e21257 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -107,7 +107,7 @@ static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu)
 
 static bool nested_vmcb_needs_vls_intercept(struct vcpu_svm *svm)
 {
-	if (!guest_can_use(&svm->vcpu, X86_FEATURE_V_VMSAVE_VMLOAD))
+	if (!guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_V_VMSAVE_VMLOAD))
 		return true;
 
 	if (!nested_npt_enabled(svm))
@@ -590,7 +590,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
 		vmcb_mark_dirty(vmcb02, VMCB_DR);
 	}
 
-	if (unlikely(guest_can_use(vcpu, X86_FEATURE_LBRV) &&
+	if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
 		     (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
 		/*
 		 * Reserved bits of DEBUGCTL are ignored.  Be consistent with
@@ -647,7 +647,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 	 * exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes.
 	 */
 
-	if (guest_can_use(vcpu, X86_FEATURE_VGIF) &&
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_VGIF) &&
 	    (svm->nested.ctl.int_ctl & V_GIF_ENABLE_MASK))
 		int_ctl_vmcb12_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
 	else
@@ -685,7 +685,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 
 	vmcb02->control.tsc_offset = vcpu->arch.tsc_offset;
 
-	if (guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR) &&
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR) &&
 	    svm->tsc_ratio_msr != kvm_caps.default_tsc_scaling_ratio)
 		nested_svm_update_tsc_ratio_msr(vcpu);
 
@@ -706,7 +706,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 	 * what a nrips=0 CPU would do (L1 is responsible for advancing RIP
 	 * prior to injecting the event).
 	 */
-	if (guest_can_use(vcpu, X86_FEATURE_NRIPS))
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
 		vmcb02->control.next_rip    = svm->nested.ctl.next_rip;
 	else if (boot_cpu_has(X86_FEATURE_NRIPS))
 		vmcb02->control.next_rip    = vmcb12_rip;
@@ -716,7 +716,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 		svm->soft_int_injected = true;
 		svm->soft_int_csbase = vmcb12_csbase;
 		svm->soft_int_old_rip = vmcb12_rip;
-		if (guest_can_use(vcpu, X86_FEATURE_NRIPS))
+		if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
 			svm->soft_int_next_rip = svm->nested.ctl.next_rip;
 		else
 			svm->soft_int_next_rip = vmcb12_rip;
@@ -724,18 +724,18 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 
 	vmcb02->control.virt_ext            = vmcb01->control.virt_ext &
 					      LBR_CTL_ENABLE_MASK;
-	if (guest_can_use(vcpu, X86_FEATURE_LBRV))
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV))
 		vmcb02->control.virt_ext  |=
 			(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK);
 
 	if (!nested_vmcb_needs_vls_intercept(svm))
 		vmcb02->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
 
-	if (guest_can_use(vcpu, X86_FEATURE_PAUSEFILTER))
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_PAUSEFILTER))
 		pause_count12 = svm->nested.ctl.pause_filter_count;
 	else
 		pause_count12 = 0;
-	if (guest_can_use(vcpu, X86_FEATURE_PFTHRESHOLD))
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_PFTHRESHOLD))
 		pause_thresh12 = svm->nested.ctl.pause_filter_thresh;
 	else
 		pause_thresh12 = 0;
@@ -1022,7 +1022,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	if (vmcb12->control.exit_code != SVM_EXIT_ERR)
 		nested_save_pending_event_to_vmcb12(svm, vmcb12);
 
-	if (guest_can_use(vcpu, X86_FEATURE_NRIPS))
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
 		vmcb12->control.next_rip  = vmcb02->control.next_rip;
 
 	vmcb12->control.int_ctl           = svm->nested.ctl.int_ctl;
@@ -1061,7 +1061,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	if (!nested_exit_on_intr(svm))
 		kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
 
-	if (unlikely(guest_can_use(vcpu, X86_FEATURE_LBRV) &&
+	if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
 		     (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
 		svm_copy_lbrs(vmcb12, vmcb02);
 		svm_update_lbrv(vcpu);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 57c2c8025547..7640dedc2ddc 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4409,16 +4409,15 @@ static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
 	 * For SEV-ES, accesses to MSR_IA32_XSS should not be intercepted if
 	 * the host/guest supports its use.
 	 *
-	 * guest_can_use() checks a number of requirements on the host/guest to
-	 * ensure that MSR_IA32_XSS is available, but it might report true even
-	 * if X86_FEATURE_XSAVES isn't configured in the guest to ensure host
-	 * MSR_IA32_XSS is always properly restored. For SEV-ES, it is better
-	 * to further check that the guest CPUID actually supports
-	 * X86_FEATURE_XSAVES so that accesses to MSR_IA32_XSS by misbehaved
-	 * guests will still get intercepted and caught in the normal
-	 * kvm_emulate_rdmsr()/kvm_emulated_wrmsr() paths.
+	 * KVM treats the guest as being capable of using XSAVES even if XSAVES
+	 * isn't enabled in guest CPUID as there is no intercept for XSAVES,
+	 * i.e. the guest can use XSAVES/XRSTOR to read/write XSS if XSAVE is
+	 * exposed to the guest and XSAVES is supported in hardware.  Condition
+	 * full XSS passthrough on the guest being able to use XSAVES *and*
+	 * XSAVES being exposed to the guest so that KVM can at least honor
+	 * guest CPUID for RDMSR and WRMSR.
 	 */
-	if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
 	    guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
 		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
 	else
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3d0549ca246f..2acd2e3bb1b0 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1039,7 +1039,7 @@ void svm_update_lbrv(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 	bool current_enable_lbrv = svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK;
 	bool enable_lbrv = (svm_get_lbr_vmcb(svm)->save.dbgctl & DEBUGCTLMSR_LBR) ||
-			    (is_guest_mode(vcpu) && guest_can_use(vcpu, X86_FEATURE_LBRV) &&
+			    (is_guest_mode(vcpu) && guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
 			    (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK));
 
 	if (enable_lbrv == current_enable_lbrv)
@@ -2841,7 +2841,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	switch (msr_info->index) {
 	case MSR_AMD64_TSC_RATIO:
 		if (!msr_info->host_initiated &&
-		    !guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR))
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR))
 			return 1;
 		msr_info->data = svm->tsc_ratio_msr;
 		break;
@@ -2991,7 +2991,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	switch (ecx) {
 	case MSR_AMD64_TSC_RATIO:
 
-		if (!guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR)) {
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR)) {
 
 			if (!msr->host_initiated)
 				return 1;
@@ -3013,7 +3013,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 
 		svm->tsc_ratio_msr = data;
 
-		if (guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR) &&
+		if (guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR) &&
 		    is_guest_mode(vcpu))
 			nested_svm_update_tsc_ratio_msr(vcpu);
 
@@ -4342,11 +4342,11 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	if (boot_cpu_has(X86_FEATURE_XSAVE) &&
 	    boot_cpu_has(X86_FEATURE_XSAVES) &&
 	    guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
-		kvm_governed_feature_set(vcpu, X86_FEATURE_XSAVES);
+		guest_cpu_cap_set(vcpu, X86_FEATURE_XSAVES);
 
-	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_NRIPS);
-	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR);
-	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_LBRV);
+	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_NRIPS);
+	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR);
+	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LBRV);
 
 	/*
 	 * Intercept VMLOAD if the vCPU mode is Intel in order to emulate that
@@ -4354,12 +4354,12 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 * SVM on Intel is bonkers and extremely unlikely to work).
 	 */
 	if (!guest_cpuid_is_intel(vcpu))
-		kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
+		guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
 
-	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_PAUSEFILTER);
-	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD);
-	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VGIF);
-	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VNMI);
+	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PAUSEFILTER);
+	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD);
+	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VGIF);
+	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VNMI);
 
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 97b3683ea324..08fd788d08df 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -487,7 +487,7 @@ static inline bool svm_is_intercept(struct vcpu_svm *svm, int bit)
 
 static inline bool nested_vgif_enabled(struct vcpu_svm *svm)
 {
-	return guest_can_use(&svm->vcpu, X86_FEATURE_VGIF) &&
+	return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_VGIF) &&
 	       (svm->nested.ctl.int_ctl & V_GIF_ENABLE_MASK);
 }
 
@@ -539,7 +539,7 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
 
 static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
 {
-	return guest_can_use(&svm->vcpu, X86_FEATURE_VNMI) &&
+	return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_VNMI) &&
 	       (svm->nested.ctl.int_ctl & V_NMI_ENABLE_MASK);
 }
 
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d5b832126e34..fb7eec29681d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6488,7 +6488,7 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu,
 	vmx = to_vmx(vcpu);
 	vmcs12 = get_vmcs12(vcpu);
 
-	if (guest_can_use(vcpu, X86_FEATURE_VMX) &&
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_VMX) &&
 	    (vmx->nested.vmxon || vmx->nested.smm.vmxon)) {
 		kvm_state.hdr.vmx.vmxon_pa = vmx->nested.vmxon_ptr;
 		kvm_state.hdr.vmx.vmcs12_pa = vmx->nested.current_vmptr;
@@ -6629,7 +6629,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
 		if (kvm_state->flags & ~KVM_STATE_NESTED_EVMCS)
 			return -EINVAL;
 	} else {
-		if (!guest_can_use(vcpu, X86_FEATURE_VMX))
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
 			return -EINVAL;
 
 		if (!page_address_valid(vcpu, kvm_state->hdr.vmx.vmxon_pa))
@@ -6663,7 +6663,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
 		return -EINVAL;
 
 	if ((kvm_state->flags & KVM_STATE_NESTED_EVMCS) &&
-	    (!guest_can_use(vcpu, X86_FEATURE_VMX) ||
+	    (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX) ||
 	     !vmx->nested.enlightened_vmcs_enabled))
 			return -EINVAL;
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 51b2cd13250a..1bc56596d653 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2050,7 +2050,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
 		break;
 	case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
-		if (!guest_can_use(vcpu, X86_FEATURE_VMX))
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
 			return 1;
 		if (vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
 				    &msr_info->data))
@@ -2360,7 +2360,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
 		if (!msr_info->host_initiated)
 			return 1; /* they are read-only */
-		if (!guest_can_use(vcpu, X86_FEATURE_VMX))
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
 			return 1;
 		return vmx_set_vmx_msr(vcpu, msr_index, data);
 	case MSR_IA32_RTIT_CTL:
@@ -4571,7 +4571,7 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
 												\
 	if (cpu_has_vmx_##name()) {								\
 		if (kvm_is_governed_feature(X86_FEATURE_##feat_name))				\
-			__enabled = guest_can_use(__vcpu, X86_FEATURE_##feat_name);		\
+			__enabled = guest_cpu_cap_has(__vcpu, X86_FEATURE_##feat_name);		\
 		else										\
 			__enabled = guest_cpuid_has(__vcpu, X86_FEATURE_##feat_name);		\
 		vmx_adjust_secondary_exec_control(vmx, exec_control, SECONDARY_EXEC_##ctrl_name,\
@@ -7838,10 +7838,10 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 */
 	if (boot_cpu_has(X86_FEATURE_XSAVE) &&
 	    guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
-		kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_XSAVES);
+		guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_XSAVES);
 
-	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VMX);
-	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_LAM);
+	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VMX);
+	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LAM);
 
 	vmx_setup_uret_msrs(vmx);
 
@@ -7849,7 +7849,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		vmcs_set_secondary_exec_control(vmx,
 						vmx_secondary_exec_control(vmx));
 
-	if (guest_can_use(vcpu, X86_FEATURE_VMX))
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
 		vmx->msr_ia32_feature_control_valid_bits |=
 			FEAT_CTL_VMX_ENABLED_INSIDE_SMX |
 			FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX;
@@ -7858,7 +7858,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 			~(FEAT_CTL_VMX_ENABLED_INSIDE_SMX |
 			  FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX);
 
-	if (guest_can_use(vcpu, X86_FEATURE_VMX))
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
 		nested_vmx_cr_fixed1_bits_update(vcpu);
 
 	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7160c5ab8e3e..4ca9651b3f43 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1026,7 +1026,7 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
 		if (vcpu->arch.xcr0 != host_xcr0)
 			xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);
 
-		if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
+		if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
 		    vcpu->arch.ia32_xss != host_xss)
 			wrmsrl(MSR_IA32_XSS, vcpu->arch.ia32_xss);
 	}
@@ -1057,7 +1057,7 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
 		if (vcpu->arch.xcr0 != host_xcr0)
 			xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0);
 
-		if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
+		if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
 		    vcpu->arch.ia32_xss != host_xss)
 			wrmsrl(MSR_IA32_XSS, host_xss);
 	}
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 37/49] KVM: x86: Replace guts of "governed" features with comprehensive cpu_caps
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (35 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 36/49] KVM: x86: Rename "governed features" helpers to use "guest_cpu_cap" Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-06-20  2:20   ` Yang, Weijiang
  2024-07-05  2:10   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 38/49] KVM: x86: Initialize guest cpu_caps based on guest CPUID Sean Christopherson
                   ` (12 subsequent siblings)
  49 siblings, 2 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Replace the internals of the governed features framework with a more
comprehensive "guest CPU capabilities" implementation, i.e. with a guest
version of kvm_cpu_caps.  Keep the skeleton of governed features around
for now as vmx_adjust_sec_exec_control() relies on detecting governed
features to do the right thing for XSAVES, and switching all guest feature
queries to guest_cpu_cap_has() requires subtle and non-trivial changes,
i.e. is best done as a standalone change.

Tracking *all* guest capabilities that KVM cares will allow excising the
poorly named "governed features" framework, and effectively optimizes all
KVM queries of guest capabilities, i.e. doesn't require making a
subjective decision as to whether or not a feature is worth "governing",
and doesn't require adding the code to do so.

The cost of tracking all features is currently 92 bytes per vCPU on 64-bit
kernels: 100 bytes for cpu_caps versus 8 bytes for governed_features.
That cost is well worth paying even if the only benefit was eliminating
the "governed features" terminology.  And practically speaking, the real
cost is zero unless those 92 bytes pushes the size of vcpu_vmx or vcpu_svm
into a new order-N allocation, and if that happens there are better ways
to reduce the footprint of kvm_vcpu_arch, e.g. making the PMU and/or MTRR
state separate allocations.

Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 45 +++++++++++++++++++++------------
 arch/x86/kvm/cpuid.c            | 14 +++++++---
 arch/x86/kvm/cpuid.h            | 12 ++++-----
 arch/x86/kvm/reverse_cpuid.h    | 16 ------------
 4 files changed, 46 insertions(+), 41 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3003e99155e7..8840d21ee0b5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -743,6 +743,22 @@ struct kvm_queued_exception {
 	bool has_payload;
 };
 
+/*
+ * Hardware-defined CPUID leafs that are either scattered by the kernel or are
+ * unknown to the kernel, but need to be directly used by KVM.  Note, these
+ * word values conflict with the kernel's "bug" caps, but KVM doesn't use those.
+ */
+enum kvm_only_cpuid_leafs {
+	CPUID_12_EAX	 = NCAPINTS,
+	CPUID_7_1_EDX,
+	CPUID_8000_0007_EDX,
+	CPUID_8000_0022_EAX,
+	CPUID_7_2_EDX,
+	NR_KVM_CPU_CAPS,
+
+	NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
+};
+
 struct kvm_vcpu_arch {
 	/*
 	 * rip and regs accesses must go through
@@ -861,23 +877,20 @@ struct kvm_vcpu_arch {
 	bool is_amd_compatible;
 
 	/*
-	 * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly
-	 * when "struct kvm_vcpu_arch" is no longer defined in an
-	 * arch/x86/include/asm header.  The max is mostly arbitrary, i.e.
-	 * can be increased as necessary.
+	 * cpu_caps holds the effective guest capabilities, i.e. the features
+	 * the vCPU is allowed to use.  Typically, but not always, features can
+	 * be used by the guest if and only if both KVM and userspace want to
+	 * expose the feature to the guest.
+	 *
+	 * A common exception is for virtualization holes, i.e. when KVM can't
+	 * prevent the guest from using a feature, in which case the vCPU "has"
+	 * the feature regardless of what KVM or userspace desires.
+	 *
+	 * Note, features that don't require KVM involvement in any way are
+	 * NOT enforced/sanitized by KVM, i.e. are taken verbatim from the
+	 * guest CPUID provided by userspace.
 	 */
-#define KVM_MAX_NR_GOVERNED_FEATURES BITS_PER_LONG
-
-	/*
-	 * Track whether or not the guest is allowed to use features that are
-	 * governed by KVM, where "governed" means KVM needs to manage state
-	 * and/or explicitly enable the feature in hardware.  Typically, but
-	 * not always, governed features can be used by the guest if and only
-	 * if both KVM and userspace want to expose the feature to the guest.
-	 */
-	struct {
-		DECLARE_BITMAP(enabled, KVM_MAX_NR_GOVERNED_FEATURES);
-	} governed_features;
+	u32 cpu_caps[NR_KVM_CPU_CAPS];
 
 	u64 reserved_gpa_bits;
 	int maxphyaddr;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 286abefc93d5..89c506cf649b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -387,9 +387,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	struct kvm_cpuid_entry2 *best;
 	bool allow_gbpages;
 
-	BUILD_BUG_ON(KVM_NR_GOVERNED_FEATURES > KVM_MAX_NR_GOVERNED_FEATURES);
-	bitmap_zero(vcpu->arch.governed_features.enabled,
-		    KVM_MAX_NR_GOVERNED_FEATURES);
+	memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps));
 
 	kvm_update_cpuid_runtime(vcpu);
 
@@ -473,6 +471,7 @@ u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu)
 static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
                         int nent)
 {
+	u32 vcpu_caps[NR_KVM_CPU_CAPS];
 	int r;
 
 	/*
@@ -480,10 +479,18 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 	 * order to massage the new entries, e.g. to account for dynamic bits
 	 * that KVM controls, without clobbering the current guest CPUID, which
 	 * KVM needs to preserve in order to unwind on failure.
+	 *
+	 * Similarly, save the vCPU's current cpu_caps so that the capabilities
+	 * can be updated alongside the CPUID entries when performing runtime
+	 * updates.  Full initialization is done if and only if the vCPU hasn't
+	 * run, i.e. only if userspace is potentially changing CPUID features.
 	 */
 	swap(vcpu->arch.cpuid_entries, e2);
 	swap(vcpu->arch.cpuid_nent, nent);
 
+	memcpy(vcpu_caps, vcpu->arch.cpu_caps, sizeof(vcpu_caps));
+	BUILD_BUG_ON(sizeof(vcpu_caps) != sizeof(vcpu->arch.cpu_caps));
+
 	/*
 	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
 	 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
@@ -527,6 +534,7 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
 	return 0;
 
 err:
+	memcpy(vcpu->arch.cpu_caps, vcpu_caps, sizeof(vcpu_caps));
 	swap(vcpu->arch.cpuid_entries, e2);
 	swap(vcpu->arch.cpuid_nent, nent);
 	return r;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index e021681f34ac..ad0168d3aec5 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -259,10 +259,10 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature)
 static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
 					      unsigned int x86_feature)
 {
-	BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
+	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
-	__set_bit(kvm_governed_feature_index(x86_feature),
-		  vcpu->arch.governed_features.enabled);
+	reverse_cpuid_check(x86_leaf);
+	vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
 }
 
 static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
@@ -275,10 +275,10 @@ static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
 static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
 					      unsigned int x86_feature)
 {
-	BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
+	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
-	return test_bit(kvm_governed_feature_index(x86_feature),
-			vcpu->arch.governed_features.enabled);
+	reverse_cpuid_check(x86_leaf);
+	return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature);
 }
 
 static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
index 245f71c16272..63d5735fbc8a 100644
--- a/arch/x86/kvm/reverse_cpuid.h
+++ b/arch/x86/kvm/reverse_cpuid.h
@@ -6,22 +6,6 @@
 #include <asm/cpufeature.h>
 #include <asm/cpufeatures.h>
 
-/*
- * Hardware-defined CPUID leafs that are either scattered by the kernel or are
- * unknown to the kernel, but need to be directly used by KVM.  Note, these
- * word values conflict with the kernel's "bug" caps, but KVM doesn't use those.
- */
-enum kvm_only_cpuid_leafs {
-	CPUID_12_EAX	 = NCAPINTS,
-	CPUID_7_1_EDX,
-	CPUID_8000_0007_EDX,
-	CPUID_8000_0022_EAX,
-	CPUID_7_2_EDX,
-	NR_KVM_CPU_CAPS,
-
-	NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
-};
-
 /*
  * Define a KVM-only feature flag.
  *
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 38/49] KVM: x86: Initialize guest cpu_caps based on guest CPUID
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (36 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 37/49] KVM: x86: Replace guts of "governed" features with comprehensive cpu_caps Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-06-20  2:24   ` Yang, Weijiang
  2024-07-05  2:13   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 39/49] KVM: x86: Extract code for generating per-entry emulated CPUID information Sean Christopherson
                   ` (11 subsequent siblings)
  49 siblings, 2 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Initialize a vCPU's capabilities based on the guest CPUID provided by
userspace instead of simply zeroing the entire array.  This is the first
step toward using cpu_caps to query *all* CPUID-based guest capabilities,
i.e. will allow converting all usage of guest_cpuid_has() to
guest_cpu_cap_has().

Zeroing the array was the logical choice when using cpu_caps was opt-in,
e.g. "unsupported" was generally a safer default, and the whole point of
governed features is that KVM would need to check host and guest support,
i.e. making everything unsupported by default didn't require more code.

But requiring KVM to manually "enable" every CPUID-based feature in
cpu_caps would require an absurd amount of boilerplate code.

Follow existing CPUID/kvm_cpu_caps nomenclature where possible, e.g. for
the change() and clear() APIs.  Replace check_and_set() with constrain()
to try and capture that KVM is constraining userspace's desired guest
feature set based on KVM's capabilities.

This is intended to be gigantic nop, i.e. should not have any impact on
guest or KVM functionality.

This is also an intermediate step; a future commit will also incorporate
KVM support into the vCPU's cpu_caps before converting guest_cpuid_has()
to guest_cpu_cap_has().

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c   | 46 ++++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/cpuid.h   | 25 ++++++++++++++++++++---
 arch/x86/kvm/svm/svm.c | 28 +++++++++++++------------
 arch/x86/kvm/vmx/vmx.c |  8 +++++---
 4 files changed, 86 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 89c506cf649b..fd725cbbcce5 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -381,13 +381,56 @@ static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
 #endif
 }
 
+/*
+ * This isn't truly "unsafe", but except for the cpu_caps initialization code,
+ * all register lookups should use __cpuid_entry_get_reg(), which provides
+ * compile-time validation of the input.
+ */
+static u32 cpuid_get_reg_unsafe(struct kvm_cpuid_entry2 *entry, u32 reg)
+{
+	switch (reg) {
+	case CPUID_EAX:
+		return entry->eax;
+	case CPUID_EBX:
+		return entry->ebx;
+	case CPUID_ECX:
+		return entry->ecx;
+	case CPUID_EDX:
+		return entry->edx;
+	default:
+		WARN_ON_ONCE(1);
+		return 0;
+	}
+}
+
 void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	struct kvm_cpuid_entry2 *best;
+	struct kvm_cpuid_entry2 *entry;
 	bool allow_gbpages;
+	int i;
 
 	memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps));
+	BUILD_BUG_ON(ARRAY_SIZE(reverse_cpuid) != NR_KVM_CPU_CAPS);
+
+	/*
+	 * Reset guest capabilities to userspace's guest CPUID definition, i.e.
+	 * honor userspace's definition for features that don't require KVM or
+	 * hardware management/support (or that KVM simply doesn't care about).
+	 */
+	for (i = 0; i < NR_KVM_CPU_CAPS; i++) {
+		const struct cpuid_reg cpuid = reverse_cpuid[i];
+
+		if (!cpuid.function)
+			continue;
+
+		entry = kvm_find_cpuid_entry_index(vcpu, cpuid.function, cpuid.index);
+		if (!entry)
+			continue;
+
+		vcpu->arch.cpu_caps[i] = cpuid_get_reg_unsafe(entry, cpuid.reg);
+	}
 
 	kvm_update_cpuid_runtime(vcpu);
 
@@ -404,8 +447,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 */
 	allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) :
 				      guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES);
-	if (allow_gbpages)
-		guest_cpu_cap_set(vcpu, X86_FEATURE_GBPAGES);
+	guest_cpu_cap_change(vcpu, X86_FEATURE_GBPAGES, allow_gbpages);
 
 	best = kvm_find_cpuid_entry(vcpu, 1);
 	if (best && apic) {
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index ad0168d3aec5..c2c2b8aa347b 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -265,11 +265,30 @@ static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
 	vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
 }
 
-static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
-							unsigned int x86_feature)
+static __always_inline void guest_cpu_cap_clear(struct kvm_vcpu *vcpu,
+						unsigned int x86_feature)
 {
-	if (kvm_cpu_cap_has(x86_feature) && guest_cpuid_has(vcpu, x86_feature))
+	unsigned int x86_leaf = __feature_leaf(x86_feature);
+
+	reverse_cpuid_check(x86_leaf);
+	vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature);
+}
+
+static __always_inline void guest_cpu_cap_change(struct kvm_vcpu *vcpu,
+						 unsigned int x86_feature,
+						 bool guest_has_cap)
+{
+	if (guest_has_cap)
 		guest_cpu_cap_set(vcpu, x86_feature);
+	else
+		guest_cpu_cap_clear(vcpu, x86_feature);
+}
+
+static __always_inline void guest_cpu_cap_constrain(struct kvm_vcpu *vcpu,
+						    unsigned int x86_feature)
+{
+	if (!kvm_cpu_cap_has(x86_feature))
+		guest_cpu_cap_clear(vcpu, x86_feature);
 }
 
 static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2acd2e3bb1b0..1bc431a7e862 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4339,27 +4339,29 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 * XSS on VM-Enter/VM-Exit.  Failure to do so would effectively give
 	 * the guest read/write access to the host's XSS.
 	 */
-	if (boot_cpu_has(X86_FEATURE_XSAVE) &&
-	    boot_cpu_has(X86_FEATURE_XSAVES) &&
-	    guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
-		guest_cpu_cap_set(vcpu, X86_FEATURE_XSAVES);
+	guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES,
+			     boot_cpu_has(X86_FEATURE_XSAVE) &&
+			     boot_cpu_has(X86_FEATURE_XSAVES) &&
+			     guest_cpuid_has(vcpu, X86_FEATURE_XSAVE));
 
-	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_NRIPS);
-	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR);
-	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LBRV);
+	guest_cpu_cap_constrain(vcpu, X86_FEATURE_NRIPS);
+	guest_cpu_cap_constrain(vcpu, X86_FEATURE_TSCRATEMSR);
+	guest_cpu_cap_constrain(vcpu, X86_FEATURE_LBRV);
 
 	/*
 	 * Intercept VMLOAD if the vCPU mode is Intel in order to emulate that
 	 * VMLOAD drops bits 63:32 of SYSENTER (ignoring the fact that exposing
 	 * SVM on Intel is bonkers and extremely unlikely to work).
 	 */
-	if (!guest_cpuid_is_intel(vcpu))
-		guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
+	if (guest_cpuid_is_intel(vcpu))
+		guest_cpu_cap_clear(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
+	else
+		guest_cpu_cap_constrain(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
 
-	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PAUSEFILTER);
-	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD);
-	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VGIF);
-	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VNMI);
+	guest_cpu_cap_constrain(vcpu, X86_FEATURE_PAUSEFILTER);
+	guest_cpu_cap_constrain(vcpu, X86_FEATURE_PFTHRESHOLD);
+	guest_cpu_cap_constrain(vcpu, X86_FEATURE_VGIF);
+	guest_cpu_cap_constrain(vcpu, X86_FEATURE_VNMI);
 
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1bc56596d653..d873386e1473 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7838,10 +7838,12 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 */
 	if (boot_cpu_has(X86_FEATURE_XSAVE) &&
 	    guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
-		guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_XSAVES);
+		guest_cpu_cap_constrain(vcpu, X86_FEATURE_XSAVES);
+	else
+		guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES);
 
-	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VMX);
-	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LAM);
+	guest_cpu_cap_constrain(vcpu, X86_FEATURE_VMX);
+	guest_cpu_cap_constrain(vcpu, X86_FEATURE_LAM);
 
 	vmx_setup_uret_msrs(vmx);
 
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 39/49] KVM: x86: Extract code for generating per-entry emulated CPUID information
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (37 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 38/49] KVM: x86: Initialize guest cpu_caps based on guest CPUID Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  2:18   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 40/49] KVM: x86: Initialize guest cpu_caps based on KVM support Sean Christopherson
                   ` (10 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Extract the meat of __do_cpuid_func_emulated() into a separate helper,
cpuid_func_emulated(), so that cpuid_func_emulated() can be used with a
single CPUID entry.  This will allow marking emulated features as fully
supported in the guest cpu_caps without needing to hardcode the set of
emulated features in multiple locations.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index fd725cbbcce5..d1849fe874ab 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1007,14 +1007,10 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
 	return entry;
 }
 
-static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
+static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func)
 {
-	struct kvm_cpuid_entry2 *entry;
+	memset(entry, 0, sizeof(*entry));
 
-	if (array->nent >= array->maxnent)
-		return -E2BIG;
-
-	entry = &array->entries[array->nent];
 	entry->function = func;
 	entry->index = 0;
 	entry->flags = 0;
@@ -1022,23 +1018,27 @@ static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
 	switch (func) {
 	case 0:
 		entry->eax = 7;
-		++array->nent;
-		break;
+		return 1;
 	case 1:
 		entry->ecx = F(MOVBE);
-		++array->nent;
-		break;
+		return 1;
 	case 7:
 		entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
 		entry->eax = 0;
 		if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
 			entry->ecx = F(RDPID);
-		++array->nent;
-		break;
+		return 1;
 	default:
-		break;
+		return 0;
 	}
+}
 
+static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
+{
+	if (array->nent >= array->maxnent)
+		return -E2BIG;
+
+	array->nent += cpuid_func_emulated(&array->entries[array->nent], func);
 	return 0;
 }
 
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 40/49] KVM: x86: Initialize guest cpu_caps based on KVM support
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (38 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 39/49] KVM: x86: Extract code for generating per-entry emulated CPUID information Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  2:22   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 41/49] KVM: x86: Avoid double CPUID lookup when updating MWAIT at runtime Sean Christopherson
                   ` (9 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Constrain all guest cpu_caps based on KVM support instead of constraining
only the few features that KVM _currently_ needs to verify are actually
supported by KVM.  The intent of cpu_caps is to track what the guest is
actually capable of using, not the raw, unfiltered CPUID values that the
guest sees.

I.e. KVM should always consult it's only support when making decisions
based on guest CPUID, and the only reason KVM has historically made the
checks opt-in was due to lack of centralized tracking.

Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c   | 14 +++++++++++++-
 arch/x86/kvm/cpuid.h   |  7 -------
 arch/x86/kvm/svm/svm.c | 11 -----------
 arch/x86/kvm/vmx/vmx.c |  9 ++-------
 4 files changed, 15 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index d1849fe874ab..8ada1cac8fcb 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -403,6 +403,8 @@ static u32 cpuid_get_reg_unsafe(struct kvm_cpuid_entry2 *entry, u32 reg)
 	}
 }
 
+static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func);
+
 void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
@@ -421,6 +423,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 */
 	for (i = 0; i < NR_KVM_CPU_CAPS; i++) {
 		const struct cpuid_reg cpuid = reverse_cpuid[i];
+		struct kvm_cpuid_entry2 emulated;
 
 		if (!cpuid.function)
 			continue;
@@ -429,7 +432,16 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		if (!entry)
 			continue;
 
-		vcpu->arch.cpu_caps[i] = cpuid_get_reg_unsafe(entry, cpuid.reg);
+		cpuid_func_emulated(&emulated, cpuid.function);
+
+		/*
+		 * A vCPU has a feature if it's supported by KVM and is enabled
+		 * in guest CPUID.  Note, this includes features that are
+		 * supported by KVM but aren't advertised to userspace!
+		 */
+		vcpu->arch.cpu_caps[i] = kvm_cpu_caps[i] | kvm_vmm_cpu_caps[i] |
+					 cpuid_get_reg_unsafe(&emulated, cpuid.reg);
+		vcpu->arch.cpu_caps[i] &= cpuid_get_reg_unsafe(entry, cpuid.reg);
 	}
 
 	kvm_update_cpuid_runtime(vcpu);
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index c2c2b8aa347b..60da304db4e4 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -284,13 +284,6 @@ static __always_inline void guest_cpu_cap_change(struct kvm_vcpu *vcpu,
 		guest_cpu_cap_clear(vcpu, x86_feature);
 }
 
-static __always_inline void guest_cpu_cap_constrain(struct kvm_vcpu *vcpu,
-						    unsigned int x86_feature)
-{
-	if (!kvm_cpu_cap_has(x86_feature))
-		guest_cpu_cap_clear(vcpu, x86_feature);
-}
-
 static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
 					      unsigned int x86_feature)
 {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1bc431a7e862..946a75771946 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4344,10 +4344,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 			     boot_cpu_has(X86_FEATURE_XSAVES) &&
 			     guest_cpuid_has(vcpu, X86_FEATURE_XSAVE));
 
-	guest_cpu_cap_constrain(vcpu, X86_FEATURE_NRIPS);
-	guest_cpu_cap_constrain(vcpu, X86_FEATURE_TSCRATEMSR);
-	guest_cpu_cap_constrain(vcpu, X86_FEATURE_LBRV);
-
 	/*
 	 * Intercept VMLOAD if the vCPU mode is Intel in order to emulate that
 	 * VMLOAD drops bits 63:32 of SYSENTER (ignoring the fact that exposing
@@ -4355,13 +4351,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 */
 	if (guest_cpuid_is_intel(vcpu))
 		guest_cpu_cap_clear(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
-	else
-		guest_cpu_cap_constrain(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
-
-	guest_cpu_cap_constrain(vcpu, X86_FEATURE_PAUSEFILTER);
-	guest_cpu_cap_constrain(vcpu, X86_FEATURE_PFTHRESHOLD);
-	guest_cpu_cap_constrain(vcpu, X86_FEATURE_VGIF);
-	guest_cpu_cap_constrain(vcpu, X86_FEATURE_VNMI);
 
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d873386e1473..653c4b68ec7f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7836,15 +7836,10 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 * to the guest.  XSAVES depends on CR4.OSXSAVE, and CR4.OSXSAVE can be
 	 * set if and only if XSAVE is supported.
 	 */
-	if (boot_cpu_has(X86_FEATURE_XSAVE) &&
-	    guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
-		guest_cpu_cap_constrain(vcpu, X86_FEATURE_XSAVES);
-	else
+	if (!boot_cpu_has(X86_FEATURE_XSAVE) ||
+	    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
 		guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES);
 
-	guest_cpu_cap_constrain(vcpu, X86_FEATURE_VMX);
-	guest_cpu_cap_constrain(vcpu, X86_FEATURE_LAM);
-
 	vmx_setup_uret_msrs(vmx);
 
 	if (cpu_has_secondary_exec_ctrls())
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 41/49] KVM: x86: Avoid double CPUID lookup when updating MWAIT at runtime
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (39 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 40/49] KVM: x86: Initialize guest cpu_caps based on KVM support Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  2:22   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 42/49] KVM: x86: Drop unnecessary check that cpuid_entry2_find() returns right leaf Sean Christopherson
                   ` (8 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Move the handling of X86_FEATURE_MWAIT during CPUID runtime updates to
utilize the lookup done for other CPUID.0x1 features.

No functional change intended.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 8ada1cac8fcb..258c5fce87fc 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -343,6 +343,11 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 
 		cpuid_entry_change(best, X86_FEATURE_APIC,
 			   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
+
+		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT))
+			cpuid_entry_change(best, X86_FEATURE_MWAIT,
+					   vcpu->arch.ia32_misc_enable_msr &
+					   MSR_IA32_MISC_ENABLE_MWAIT);
 	}
 
 	best = kvm_find_cpuid_entry_index(vcpu, 7, 0);
@@ -358,14 +363,6 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
 		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
 		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
-
-	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
-		best = kvm_find_cpuid_entry(vcpu, 0x1);
-		if (best)
-			cpuid_entry_change(best, X86_FEATURE_MWAIT,
-					   vcpu->arch.ia32_misc_enable_msr &
-					   MSR_IA32_MISC_ENABLE_MWAIT);
-	}
 }
 EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);
 
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 42/49] KVM: x86: Drop unnecessary check that cpuid_entry2_find() returns right leaf
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (40 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 41/49] KVM: x86: Avoid double CPUID lookup when updating MWAIT at runtime Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  2:22   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 43/49] KVM: x86: Update OS{XSAVE,PKE} bits in guest CPUID irrespective of host support Sean Christopherson
                   ` (7 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Drop an unnecessary check that kvm_find_cpuid_entry_index(), i.e.
cpuid_entry2_find(), returns the correct leaf when getting CPUID.0x7.0x0
to update X86_FEATURE_OSPKE.  cpuid_entry2_find() never returns an entry
for the wrong function.  And not that it matters, but cpuid_entry2_find()
will always return a precise match for CPUID.0x7.0x0 since the index is
significant.

No functional change intended.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 258c5fce87fc..8256fc657c6b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -351,7 +351,7 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 	}
 
 	best = kvm_find_cpuid_entry_index(vcpu, 7, 0);
-	if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7)
+	if (best && boot_cpu_has(X86_FEATURE_PKU))
 		cpuid_entry_change(best, X86_FEATURE_OSPKE,
 				   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
 
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 43/49] KVM: x86: Update OS{XSAVE,PKE} bits in guest CPUID irrespective of host support
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (41 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 42/49] KVM: x86: Drop unnecessary check that cpuid_entry2_find() returns right leaf Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  2:22   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 44/49] KVM: x86: Update guest cpu_caps at runtime for dynamic CPUID-based features Sean Christopherson
                   ` (6 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

When making runtime CPUID updates, change OSXSAVE and OSPKE even if their
respective base features (XSAVE, PKU) are not supported by the host.  KVM
already incorporates host support in the vCPU's effective reserved CR4 bits.
I.e. OSXSAVE and OSPKE can be set if and only if the host supports them.

And conversely, since KVM's ABI is that KVM owns the dynamic OS feature
flags, clearing them when they obviously aren't supported and thus can't
be enabled is arguably a fix.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 8256fc657c6b..552e65ba5efa 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -336,10 +336,8 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 
 	best = kvm_find_cpuid_entry(vcpu, 1);
 	if (best) {
-		/* Update OSXSAVE bit */
-		if (boot_cpu_has(X86_FEATURE_XSAVE))
-			cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
-					   kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE));
+		cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
+				   kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE));
 
 		cpuid_entry_change(best, X86_FEATURE_APIC,
 			   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
@@ -351,7 +349,7 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 	}
 
 	best = kvm_find_cpuid_entry_index(vcpu, 7, 0);
-	if (best && boot_cpu_has(X86_FEATURE_PKU))
+	if (best)
 		cpuid_entry_change(best, X86_FEATURE_OSPKE,
 				   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
 
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 44/49] KVM: x86: Update guest cpu_caps at runtime for dynamic CPUID-based features
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (42 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 43/49] KVM: x86: Update OS{XSAVE,PKE} bits in guest CPUID irrespective of host support Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  2:26   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 45/49] KVM: x86: Shuffle code to prepare for dropping guest_cpuid_has() Sean Christopherson
                   ` (5 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

When updating guest CPUID entries to emulate runtime behavior, e.g. when
the guest enables a CR4-based feature that is tied to a CPUID flag, also
update the vCPU's cpu_caps accordingly.  This will allow replacing all
usage of guest_cpuid_has() with guest_cpu_cap_has().

Note, this relies on kvm_set_cpuid() taking a snapshot of cpu_caps before
invoking kvm_update_cpuid_runtime(), i.e. when KVM is updating CPUID
entries that *may* become the vCPU's CPUID, so that unwinding to the old
cpu_caps is possible if userspace tries to set bogus CPUID information.

Note #2, none of the features in question use guest_cpu_cap_has() at this
time, i.e. aside from settings bits in cpu_caps, this is a glorified nop.

Cc: Yang Weijiang <weijiang.yang@intel.com>
Cc: Robert Hoo <robert.hoo.linux@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 552e65ba5efa..1424a9d4eb17 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -330,28 +330,38 @@ static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu)
 	return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
 }
 
+static __always_inline void kvm_update_feature_runtime(struct kvm_vcpu *vcpu,
+						       struct kvm_cpuid_entry2 *entry,
+						       unsigned int x86_feature,
+						       bool has_feature)
+{
+	cpuid_entry_change(entry, x86_feature, has_feature);
+	guest_cpu_cap_change(vcpu, x86_feature, has_feature);
+}
+
 void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
 
 	best = kvm_find_cpuid_entry(vcpu, 1);
 	if (best) {
-		cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
-				   kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE));
+		kvm_update_feature_runtime(vcpu, best, X86_FEATURE_OSXSAVE,
+					   kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE));
 
-		cpuid_entry_change(best, X86_FEATURE_APIC,
-			   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
+		kvm_update_feature_runtime(vcpu, best, X86_FEATURE_APIC,
+					   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
 
 		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT))
-			cpuid_entry_change(best, X86_FEATURE_MWAIT,
-					   vcpu->arch.ia32_misc_enable_msr &
-					   MSR_IA32_MISC_ENABLE_MWAIT);
+			kvm_update_feature_runtime(vcpu, best, X86_FEATURE_MWAIT,
+						   vcpu->arch.ia32_misc_enable_msr &
+						   MSR_IA32_MISC_ENABLE_MWAIT);
 	}
 
 	best = kvm_find_cpuid_entry_index(vcpu, 7, 0);
 	if (best)
-		cpuid_entry_change(best, X86_FEATURE_OSPKE,
-				   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
+		kvm_update_feature_runtime(vcpu, best, X86_FEATURE_OSPKE,
+					   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
+
 
 	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 0);
 	if (best)
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 45/49] KVM: x86: Shuffle code to prepare for dropping guest_cpuid_has()
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (43 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 44/49] KVM: x86: Update guest cpu_caps at runtime for dynamic CPUID-based features Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  2:26   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 46/49] KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_caps Sean Christopherson
                   ` (4 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Move the implementations of guest_has_{spec_ctrl,pred_cmd}_msr() down
below guest_cpu_cap_has() so that their use of guest_cpuid_has() can be
replaced with calls to guest_cpu_cap_has().

No functional change intended.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.h | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 60da304db4e4..7be56fa62342 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -168,21 +168,6 @@ static inline int guest_cpuid_stepping(struct kvm_vcpu *vcpu)
 	return x86_stepping(best->eax);
 }
 
-static inline bool guest_has_spec_ctrl_msr(struct kvm_vcpu *vcpu)
-{
-	return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
-		guest_cpuid_has(vcpu, X86_FEATURE_AMD_STIBP) ||
-		guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS) ||
-		guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD));
-}
-
-static inline bool guest_has_pred_cmd_msr(struct kvm_vcpu *vcpu)
-{
-	return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
-		guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBPB) ||
-		guest_cpuid_has(vcpu, X86_FEATURE_SBPB));
-}
-
 static inline bool supports_cpuid_fault(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.msr_platform_info & MSR_PLATFORM_INFO_CPUID_FAULT;
@@ -301,4 +286,19 @@ static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr
 	return kvm_vcpu_is_legal_gpa(vcpu, cr3);
 }
 
+static inline bool guest_has_spec_ctrl_msr(struct kvm_vcpu *vcpu)
+{
+	return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
+		guest_cpuid_has(vcpu, X86_FEATURE_AMD_STIBP) ||
+		guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS) ||
+		guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD));
+}
+
+static inline bool guest_has_pred_cmd_msr(struct kvm_vcpu *vcpu)
+{
+	return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
+		guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBPB) ||
+		guest_cpuid_has(vcpu, X86_FEATURE_SBPB));
+}
+
 #endif
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 46/49] KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_caps
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (44 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 45/49] KVM: x86: Shuffle code to prepare for dropping guest_cpuid_has() Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  2:34   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 47/49] KVM: x86: Drop superfluous host XSAVE check when adjusting guest XSAVES caps Sean Christopherson
                   ` (3 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Switch all queries (except XSAVES) of guest features from guest CPUID to
guest capabilities, i.e. replace all calls to guest_cpuid_has() with calls
to guest_cpu_cap_has().

Keep guest_cpuid_has() around for XSAVES, but subsume its helper
guest_cpuid_get_register() and add a compile-time assertion to prevent
using guest_cpuid_has() for any other feature.  Add yet another comment
for XSAVE to explain why KVM is allowed to query its raw guest CPUID.

Opportunistically drop the unused guest_cpuid_clear(), as there should be
no circumstance in which KVM needs to _clear_ a guest CPUID feature now
that everything is tracked via cpu_caps.  E.g. KVM may need to _change_
a feature to emulate dynamic CPUID flags, but KVM should never need to
clear a feature in guest CPUID to prevent it from being used by the guest.

Delete the last remnants of the governed features framework, as the lone
holdout was vmx_adjust_secondary_exec_control()'s divergent behavior for
governed vs. ungoverned features.

Note, replacing guest_cpuid_has() checks with guest_cpu_cap_has() when
computing reserved CR4 bits is a nop when viewed as a whole, as KVM's
capabilities are already incorporated into the calculation, i.e. if a
feature is present in guest CPUID but unsupported by KVM, its CR4 bit
was already being marked as reserved, checking guest_cpu_cap_has() simply
double-stamps that it's a reserved bit.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c             |  4 +-
 arch/x86/kvm/cpuid.h             | 74 +++++++++++---------------------
 arch/x86/kvm/governed_features.h | 22 ----------
 arch/x86/kvm/hyperv.c            |  2 +-
 arch/x86/kvm/lapic.c             |  2 +-
 arch/x86/kvm/mtrr.c              |  2 +-
 arch/x86/kvm/smm.c               | 10 ++---
 arch/x86/kvm/svm/pmu.c           |  8 ++--
 arch/x86/kvm/svm/sev.c           |  4 +-
 arch/x86/kvm/svm/svm.c           | 20 ++++-----
 arch/x86/kvm/vmx/hyperv.h        |  2 +-
 arch/x86/kvm/vmx/nested.c        | 12 +++---
 arch/x86/kvm/vmx/pmu_intel.c     |  4 +-
 arch/x86/kvm/vmx/sgx.c           | 14 +++---
 arch/x86/kvm/vmx/vmx.c           | 47 ++++++++++----------
 arch/x86/kvm/x86.c               | 64 +++++++++++++--------------
 16 files changed, 121 insertions(+), 170 deletions(-)
 delete mode 100644 arch/x86/kvm/governed_features.h

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 1424a9d4eb17..0130e0677387 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -463,7 +463,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 * and can install smaller shadow pages if the host lacks 1GiB support.
 	 */
 	allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) :
-				      guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES);
+				      guest_cpu_cap_has(vcpu, X86_FEATURE_GBPAGES);
 	guest_cpu_cap_change(vcpu, X86_FEATURE_GBPAGES, allow_gbpages);
 
 	best = kvm_find_cpuid_entry(vcpu, 1);
@@ -488,7 +488,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 #define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
 	vcpu->arch.cr4_guest_rsvd_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_) |
-					 __cr4_reserved_bits(guest_cpuid_has, vcpu);
+					 __cr4_reserved_bits(guest_cpu_cap_has, vcpu);
 #undef __kvm_cpu_cap_has
 
 	kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu));
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 7be56fa62342..0bf3bddd0e29 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -67,41 +67,38 @@ static __always_inline void cpuid_entry_override(struct kvm_cpuid_entry2 *entry,
 	*reg = kvm_cpu_caps[leaf];
 }
 
-static __always_inline u32 *guest_cpuid_get_register(struct kvm_vcpu *vcpu,
-						     unsigned int x86_feature)
+static __always_inline bool guest_cpuid_has(struct kvm_vcpu *vcpu,
+					    unsigned int x86_feature)
 {
 	const struct cpuid_reg cpuid = x86_feature_cpuid(x86_feature);
 	struct kvm_cpuid_entry2 *entry;
+	u32 *reg;
+
+	/*
+	 * XSAVES is a special snowflake.  Due to lack of a dedicated intercept
+	 * on SVM, KVM must assume that XSAVES (and thus XRSTORS) is usable by
+	 * the guest if the host supports XSAVES and *XSAVE* is exposed to the
+	 * guest.  Although the guest can read/write XSS via XSAVES/XRSTORS, to
+	 * minimize the virtualization hole, KVM rejects attempts to read/write
+	 * XSS via RDMSR/WRMSR.  To make that work, KVM needs to check the raw
+	 * guest CPUID, not KVM's view of guest capabilities.
+	 *
+	 * For all other features, guest capabilities are accurate.  Expand
+	 * this allowlist with extreme vigilance.
+	 */
+	BUILD_BUG_ON(x86_feature != X86_FEATURE_XSAVES);
 
 	entry = kvm_find_cpuid_entry_index(vcpu, cpuid.function, cpuid.index);
 	if (!entry)
 		return NULL;
 
-	return __cpuid_entry_get_reg(entry, cpuid.reg);
-}
-
-static __always_inline bool guest_cpuid_has(struct kvm_vcpu *vcpu,
-					    unsigned int x86_feature)
-{
-	u32 *reg;
-
-	reg = guest_cpuid_get_register(vcpu, x86_feature);
+	reg = __cpuid_entry_get_reg(entry, cpuid.reg);
 	if (!reg)
 		return false;
 
 	return *reg & __feature_bit(x86_feature);
 }
 
-static __always_inline void guest_cpuid_clear(struct kvm_vcpu *vcpu,
-					      unsigned int x86_feature)
-{
-	u32 *reg;
-
-	reg = guest_cpuid_get_register(vcpu, x86_feature);
-	if (reg)
-		*reg &= ~__feature_bit(x86_feature);
-}
-
 static inline bool guest_cpuid_is_amd_or_hygon(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
@@ -220,27 +217,6 @@ static __always_inline bool guest_pv_has(struct kvm_vcpu *vcpu,
 	return vcpu->arch.pv_cpuid.features & (1u << kvm_feature);
 }
 
-enum kvm_governed_features {
-#define KVM_GOVERNED_FEATURE(x) KVM_GOVERNED_##x,
-#include "governed_features.h"
-	KVM_NR_GOVERNED_FEATURES
-};
-
-static __always_inline int kvm_governed_feature_index(unsigned int x86_feature)
-{
-	switch (x86_feature) {
-#define KVM_GOVERNED_FEATURE(x) case x: return KVM_GOVERNED_##x;
-#include "governed_features.h"
-	default:
-		return -1;
-	}
-}
-
-static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature)
-{
-	return kvm_governed_feature_index(x86_feature) >= 0;
-}
-
 static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
 					      unsigned int x86_feature)
 {
@@ -288,17 +264,17 @@ static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr
 
 static inline bool guest_has_spec_ctrl_msr(struct kvm_vcpu *vcpu)
 {
-	return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
-		guest_cpuid_has(vcpu, X86_FEATURE_AMD_STIBP) ||
-		guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS) ||
-		guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD));
+	return (guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
+		guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_STIBP) ||
+		guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBRS) ||
+		guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_SSBD));
 }
 
 static inline bool guest_has_pred_cmd_msr(struct kvm_vcpu *vcpu)
 {
-	return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
-		guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBPB) ||
-		guest_cpuid_has(vcpu, X86_FEATURE_SBPB));
+	return (guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
+		guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBPB) ||
+		guest_cpu_cap_has(vcpu, X86_FEATURE_SBPB));
 }
 
 #endif
diff --git a/arch/x86/kvm/governed_features.h b/arch/x86/kvm/governed_features.h
deleted file mode 100644
index ad463b1ed4e4..000000000000
--- a/arch/x86/kvm/governed_features.h
+++ /dev/null
@@ -1,22 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#if !defined(KVM_GOVERNED_FEATURE) || defined(KVM_GOVERNED_X86_FEATURE)
-BUILD_BUG()
-#endif
-
-#define KVM_GOVERNED_X86_FEATURE(x) KVM_GOVERNED_FEATURE(X86_FEATURE_##x)
-
-KVM_GOVERNED_X86_FEATURE(GBPAGES)
-KVM_GOVERNED_X86_FEATURE(XSAVES)
-KVM_GOVERNED_X86_FEATURE(VMX)
-KVM_GOVERNED_X86_FEATURE(NRIPS)
-KVM_GOVERNED_X86_FEATURE(TSCRATEMSR)
-KVM_GOVERNED_X86_FEATURE(V_VMSAVE_VMLOAD)
-KVM_GOVERNED_X86_FEATURE(LBRV)
-KVM_GOVERNED_X86_FEATURE(PAUSEFILTER)
-KVM_GOVERNED_X86_FEATURE(PFTHRESHOLD)
-KVM_GOVERNED_X86_FEATURE(VGIF)
-KVM_GOVERNED_X86_FEATURE(VNMI)
-KVM_GOVERNED_X86_FEATURE(LAM)
-
-#undef KVM_GOVERNED_X86_FEATURE
-#undef KVM_GOVERNED_FEATURE
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 8a47f8541eab..4971b60a1882 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1352,7 +1352,7 @@ static void __kvm_hv_xsaves_xsavec_maybe_warn(struct kvm_vcpu *vcpu)
 		return;
 
 	if (guest_cpuid_has(vcpu, X86_FEATURE_XSAVES) ||
-	    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVEC))
+	    !guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVEC))
 		return;
 
 	pr_notice_ratelimited("Booting SMP Windows KVM VM with !XSAVES && XSAVEC. "
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index ebf41023be38..37a2ecee3d75 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -590,7 +590,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 	 * version first and level-triggered interrupts never get EOIed in
 	 * IOAPIC.
 	 */
-	if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC) &&
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) &&
 	    !ioapic_in_kernel(vcpu->kvm))
 		v |= APIC_LVR_DIRECTED_EOI;
 	kvm_lapic_set_reg(apic, APIC_LVR, v);
diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index a67c28a56417..9e8cb38ae1db 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -128,7 +128,7 @@ static u8 mtrr_disabled_type(struct kvm_vcpu *vcpu)
 	 * enable MTRRs and it is obviously undesirable to run the
 	 * guest entirely with UC memory and we use WB.
 	 */
-	if (guest_cpuid_has(vcpu, X86_FEATURE_MTRR))
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_MTRR))
 		return MTRR_TYPE_UNCACHABLE;
 	else
 		return MTRR_TYPE_WRBACK;
diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index d06d43d8d2aa..9144b28789df 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -283,7 +283,7 @@ void enter_smm(struct kvm_vcpu *vcpu)
 	memset(smram.bytes, 0, sizeof(smram.bytes));
 
 #ifdef CONFIG_X86_64
-	if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
 		enter_smm_save_state_64(vcpu, &smram.smram64);
 	else
 #endif
@@ -353,7 +353,7 @@ void enter_smm(struct kvm_vcpu *vcpu)
 	kvm_set_segment(vcpu, &ds, VCPU_SREG_SS);
 
 #ifdef CONFIG_X86_64
-	if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
 		if (static_call(kvm_x86_set_efer)(vcpu, 0))
 			goto error;
 #endif
@@ -586,7 +586,7 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
 	 * supports long mode.
 	 */
 #ifdef CONFIG_X86_64
-	if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) {
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_LM)) {
 		struct kvm_segment cs_desc;
 		unsigned long cr4;
 
@@ -609,7 +609,7 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
 		kvm_set_cr0(vcpu, cr0 & ~(X86_CR0_PG | X86_CR0_PE));
 
 #ifdef CONFIG_X86_64
-	if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) {
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_LM)) {
 		unsigned long cr4, efer;
 
 		/* Clear CR4.PAE before clearing EFER.LME. */
@@ -632,7 +632,7 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
 		return X86EMUL_UNHANDLEABLE;
 
 #ifdef CONFIG_X86_64
-	if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
 		return rsm_load_state_64(ctxt, &smram.smram64);
 	else
 #endif
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index dfcc38bd97d3..4a4be2da1345 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -46,7 +46,7 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
 
 	switch (msr) {
 	case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5:
-		if (!guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE))
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PERFCTR_CORE))
 			return NULL;
 		/*
 		 * Each PMU counter has a pair of CTL and CTR MSRs. CTLn
@@ -109,7 +109,7 @@ static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	case MSR_K7_EVNTSEL0 ... MSR_K7_PERFCTR3:
 		return pmu->version > 0;
 	case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5:
-		return guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE);
+		return guest_cpu_cap_has(vcpu, X86_FEATURE_PERFCTR_CORE);
 	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
 	case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
 	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
@@ -179,7 +179,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
 	union cpuid_0x80000022_ebx ebx;
 
 	pmu->version = 1;
-	if (guest_cpuid_has(vcpu, X86_FEATURE_PERFMON_V2)) {
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_PERFMON_V2)) {
 		pmu->version = 2;
 		/*
 		 * Note, PERFMON_V2 is also in 0x80000022.0x0, i.e. the guest
@@ -189,7 +189,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
 			     x86_feature_cpuid(X86_FEATURE_PERFMON_V2).index);
 		ebx.full = kvm_find_cpuid_entry_index(vcpu, 0x80000022, 0)->ebx;
 		pmu->nr_arch_gp_counters = ebx.split.num_core_pmc;
-	} else if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) {
+	} else if (guest_cpu_cap_has(vcpu, X86_FEATURE_PERFCTR_CORE)) {
 		pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS_CORE;
 	} else {
 		pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS;
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7640dedc2ddc..1004280599b4 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -4399,8 +4399,8 @@ static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
 	struct kvm_vcpu *vcpu = &svm->vcpu;
 
 	if (boot_cpu_has(X86_FEATURE_V_TSC_AUX)) {
-		bool v_tsc_aux = guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) ||
-				 guest_cpuid_has(vcpu, X86_FEATURE_RDPID);
+		bool v_tsc_aux = guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) ||
+				 guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID);
 
 		set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, v_tsc_aux, v_tsc_aux);
 	}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 946a75771946..06770b60c0ba 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1178,14 +1178,14 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
 	 */
 	if (kvm_cpu_cap_has(X86_FEATURE_INVPCID)) {
 		if (!npt_enabled ||
-		    !guest_cpuid_has(&svm->vcpu, X86_FEATURE_INVPCID))
+		    !guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_INVPCID))
 			svm_set_intercept(svm, INTERCEPT_INVPCID);
 		else
 			svm_clr_intercept(svm, INTERCEPT_INVPCID);
 	}
 
 	if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP)) {
-		if (guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
+		if (guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP))
 			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
 		else
 			svm_set_intercept(svm, INTERCEPT_RDTSCP);
@@ -2911,7 +2911,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_AMD64_VIRT_SPEC_CTRL:
 		if (!msr_info->host_initiated &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_VIRT_SSBD))
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_VIRT_SSBD))
 			return 1;
 
 		msr_info->data = svm->virt_spec_ctrl;
@@ -3058,7 +3058,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		break;
 	case MSR_AMD64_VIRT_SPEC_CTRL:
 		if (!msr->host_initiated &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_VIRT_SSBD))
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_VIRT_SSBD))
 			return 1;
 
 		if (data & ~SPEC_CTRL_SSBD)
@@ -3230,7 +3230,7 @@ static int invpcid_interception(struct kvm_vcpu *vcpu)
 	unsigned long type;
 	gva_t gva;
 
-	if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_INVPCID)) {
 		kvm_queue_exception(vcpu, UD_VECTOR);
 		return 1;
 	}
@@ -4342,7 +4342,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES,
 			     boot_cpu_has(X86_FEATURE_XSAVE) &&
 			     boot_cpu_has(X86_FEATURE_XSAVES) &&
-			     guest_cpuid_has(vcpu, X86_FEATURE_XSAVE));
+			     guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE));
 
 	/*
 	 * Intercept VMLOAD if the vCPU mode is Intel in order to emulate that
@@ -4360,7 +4360,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
 		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0,
-				     !!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
+				     !!guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
 
 	if (sev_guest(vcpu->kvm))
 		sev_vcpu_after_set_cpuid(svm);
@@ -4617,7 +4617,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
 	 * responsible for ensuring nested SVM and SMIs are mutually exclusive.
 	 */
 
-	if (!guest_cpuid_has(vcpu, X86_FEATURE_LM))
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
 		return 1;
 
 	smram->smram64.svm_guest_flag = 1;
@@ -4664,14 +4664,14 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
 
 	const struct kvm_smram_state_64 *smram64 = &smram->smram64;
 
-	if (!guest_cpuid_has(vcpu, X86_FEATURE_LM))
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
 		return 0;
 
 	/* Non-zero if SMI arrived while vCPU was in guest mode. */
 	if (!smram64->svm_guest_flag)
 		return 0;
 
-	if (!guest_cpuid_has(vcpu, X86_FEATURE_SVM))
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SVM))
 		return 1;
 
 	if (!(smram64->efer & EFER_SVME))
diff --git a/arch/x86/kvm/vmx/hyperv.h b/arch/x86/kvm/vmx/hyperv.h
index a87407412615..11a339009781 100644
--- a/arch/x86/kvm/vmx/hyperv.h
+++ b/arch/x86/kvm/vmx/hyperv.h
@@ -42,7 +42,7 @@ static inline struct hv_enlightened_vmcs *nested_vmx_evmcs(struct vcpu_vmx *vmx)
 	return vmx->nested.hv_evmcs;
 }
 
-static inline bool guest_cpuid_has_evmcs(struct kvm_vcpu *vcpu)
+static inline bool guest_cpu_cap_has_evmcs(struct kvm_vcpu *vcpu)
 {
 	/*
 	 * eVMCS is exposed to the guest if Hyper-V is enabled in CPUID and
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index fb7eec29681d..fcba0061083d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -259,7 +259,7 @@ static bool nested_evmcs_handle_vmclear(struct kvm_vcpu *vcpu, gpa_t vmptr)
 	 * state. It is possible that the area will stay mapped as
 	 * vmx->nested.hv_evmcs but this shouldn't be a problem.
 	 */
-	if (!guest_cpuid_has_evmcs(vcpu) ||
+	if (!guest_cpu_cap_has_evmcs(vcpu) ||
 	    !evmptr_is_valid(nested_get_evmptr(vcpu)))
 		return false;
 
@@ -2061,7 +2061,7 @@ static enum nested_evmptrld_status nested_vmx_handle_enlightened_vmptrld(
 	bool evmcs_gpa_changed = false;
 	u64 evmcs_gpa;
 
-	if (likely(!guest_cpuid_has_evmcs(vcpu)))
+	if (likely(!guest_cpu_cap_has_evmcs(vcpu)))
 		return EVMPTRLD_DISABLED;
 
 	evmcs_gpa = nested_get_evmptr(vcpu);
@@ -2947,7 +2947,7 @@ static int nested_vmx_check_controls(struct kvm_vcpu *vcpu,
 		return -EINVAL;
 
 #ifdef CONFIG_KVM_HYPERV
-	if (guest_cpuid_has_evmcs(vcpu))
+	if (guest_cpu_cap_has_evmcs(vcpu))
 		return nested_evmcs_check_controls(vmcs12);
 #endif
 
@@ -3231,7 +3231,7 @@ static bool nested_get_evmcs_page(struct kvm_vcpu *vcpu)
 	 * L2 was running), map it here to make sure vmcs12 changes are
 	 * properly reflected.
 	 */
-	if (guest_cpuid_has_evmcs(vcpu) &&
+	if (guest_cpu_cap_has_evmcs(vcpu) &&
 	    vmx->nested.hv_evmcs_vmptr == EVMPTR_MAP_PENDING) {
 		enum nested_evmptrld_status evmptrld_status =
 			nested_vmx_handle_enlightened_vmptrld(vcpu, false);
@@ -4882,7 +4882,7 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
 	 * doesn't isolate different VMCSs, i.e. in this case, doesn't provide
 	 * separate modes for L2 vs L1.
 	 */
-	if (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL))
 		indirect_branch_prediction_barrier();
 
 	/* Update any VMCS fields that might have changed while L2 ran */
@@ -6152,7 +6152,7 @@ static bool nested_vmx_exit_handled_encls(struct kvm_vcpu *vcpu,
 {
 	u32 encls_leaf;
 
-	if (!guest_cpuid_has(vcpu, X86_FEATURE_SGX) ||
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SGX) ||
 	    !nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENCLS_EXITING))
 		return false;
 
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index be40474de6e4..a739defa6796 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -110,7 +110,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
 
 static inline u64 vcpu_get_perf_capabilities(struct kvm_vcpu *vcpu)
 {
-	if (!guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
 		return 0;
 
 	return vcpu->arch.perf_capabilities;
@@ -160,7 +160,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 		ret = vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PEBS_FORMAT;
 		break;
 	case MSR_IA32_DS_AREA:
-		ret = guest_cpuid_has(vcpu, X86_FEATURE_DS);
+		ret = guest_cpu_cap_has(vcpu, X86_FEATURE_DS);
 		break;
 	case MSR_PEBS_DATA_CFG:
 		perf_capabilities = vcpu_get_perf_capabilities(vcpu);
diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
index 6fef01e0536e..f57f072a16f6 100644
--- a/arch/x86/kvm/vmx/sgx.c
+++ b/arch/x86/kvm/vmx/sgx.c
@@ -123,7 +123,7 @@ static int sgx_inject_fault(struct kvm_vcpu *vcpu, gva_t gva, int trapnr)
 	 * likely than a bad userspace address.
 	 */
 	if ((trapnr == PF_VECTOR || !boot_cpu_has(X86_FEATURE_SGX2)) &&
-	    guest_cpuid_has(vcpu, X86_FEATURE_SGX2)) {
+	    guest_cpu_cap_has(vcpu, X86_FEATURE_SGX2)) {
 		memset(&ex, 0, sizeof(ex));
 		ex.vector = PF_VECTOR;
 		ex.error_code = PFERR_PRESENT_MASK | PFERR_WRITE_MASK |
@@ -366,7 +366,7 @@ static inline bool encls_leaf_enabled_in_guest(struct kvm_vcpu *vcpu, u32 leaf)
 		return true;
 
 	if (leaf >= EAUG && leaf <= EMODT)
-		return guest_cpuid_has(vcpu, X86_FEATURE_SGX2);
+		return guest_cpu_cap_has(vcpu, X86_FEATURE_SGX2);
 
 	return false;
 }
@@ -382,8 +382,8 @@ int handle_encls(struct kvm_vcpu *vcpu)
 {
 	u32 leaf = (u32)kvm_rax_read(vcpu);
 
-	if (!enable_sgx || !guest_cpuid_has(vcpu, X86_FEATURE_SGX) ||
-	    !guest_cpuid_has(vcpu, X86_FEATURE_SGX1)) {
+	if (!enable_sgx || !guest_cpu_cap_has(vcpu, X86_FEATURE_SGX) ||
+	    !guest_cpu_cap_has(vcpu, X86_FEATURE_SGX1)) {
 		kvm_queue_exception(vcpu, UD_VECTOR);
 	} else if (!encls_leaf_enabled_in_guest(vcpu, leaf) ||
 		   !sgx_enabled_in_guest_bios(vcpu) || !is_paging(vcpu)) {
@@ -480,15 +480,15 @@ void vmx_write_encls_bitmap(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
 	if (!cpu_has_vmx_encls_vmexit())
 		return;
 
-	if (guest_cpuid_has(vcpu, X86_FEATURE_SGX) &&
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX) &&
 	    sgx_enabled_in_guest_bios(vcpu)) {
-		if (guest_cpuid_has(vcpu, X86_FEATURE_SGX1)) {
+		if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX1)) {
 			bitmap &= ~GENMASK_ULL(ETRACK, ECREATE);
 			if (sgx_intercept_encls_ecreate(vcpu))
 				bitmap |= (1 << ECREATE);
 		}
 
-		if (guest_cpuid_has(vcpu, X86_FEATURE_SGX2))
+		if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX2))
 			bitmap &= ~GENMASK_ULL(EMODT, EAUG);
 
 		/*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 653c4b68ec7f..741961a1edcc 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1874,8 +1874,8 @@ static void vmx_setup_uret_msrs(struct vcpu_vmx *vmx)
 	vmx_setup_uret_msr(vmx, MSR_EFER, update_transition_efer(vmx));
 
 	vmx_setup_uret_msr(vmx, MSR_TSC_AUX,
-			   guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
-			   guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID));
+			   guest_cpu_cap_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
+			   guest_cpu_cap_has(&vmx->vcpu, X86_FEATURE_RDPID));
 
 	/*
 	 * hle=0, rtm=0, tsx_ctrl=1 can be found with some combinations of new
@@ -2028,7 +2028,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_BNDCFGS:
 		if (!kvm_mpx_supported() ||
 		    (!msr_info->host_initiated &&
-		     !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
+		     !guest_cpu_cap_has(vcpu, X86_FEATURE_MPX)))
 			return 1;
 		msr_info->data = vmcs_read64(GUEST_BNDCFGS);
 		break;
@@ -2044,7 +2044,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
 		if (!msr_info->host_initiated &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC))
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC))
 			return 1;
 		msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash
 			[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
@@ -2063,7 +2063,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 * sanity checking and refuse to boot. Filter all unsupported
 		 * features out.
 		 */
-		if (!msr_info->host_initiated && guest_cpuid_has_evmcs(vcpu))
+		if (!msr_info->host_initiated && guest_cpu_cap_has_evmcs(vcpu))
 			nested_evmcs_filter_control_msr(vcpu, msr_info->index,
 							&msr_info->data);
 #endif
@@ -2133,7 +2133,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
 						    u64 data)
 {
 #ifdef CONFIG_X86_64
-	if (!guest_cpuid_has(vcpu, X86_FEATURE_LM))
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
 		return (u32)data;
 #endif
 	return (unsigned long)data;
@@ -2144,7 +2144,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
 	u64 debugctl = 0;
 
 	if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) &&
-	    (host_initiated || guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT)))
+	    (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT)))
 		debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;
 
 	if ((kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT) &&
@@ -2248,7 +2248,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_BNDCFGS:
 		if (!kvm_mpx_supported() ||
 		    (!msr_info->host_initiated &&
-		     !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
+		     !guest_cpu_cap_has(vcpu, X86_FEATURE_MPX)))
 			return 1;
 		if (is_noncanonical_address(data & PAGE_MASK, vcpu) ||
 		    (data & MSR_IA32_BNDCFGS_RSVD))
@@ -2350,7 +2350,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		 * behavior, but it's close enough.
 		 */
 		if (!msr_info->host_initiated &&
-		    (!guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC) ||
+		    (!guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC) ||
 		    ((vmx->msr_ia32_feature_control & FEAT_CTL_LOCKED) &&
 		    !(vmx->msr_ia32_feature_control & FEAT_CTL_SGX_LC_ENABLED))))
 			return 1;
@@ -2436,9 +2436,9 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			if ((data & PERF_CAP_PEBS_MASK) !=
 			    (kvm_caps.supported_perf_cap & PERF_CAP_PEBS_MASK))
 				return 1;
-			if (!guest_cpuid_has(vcpu, X86_FEATURE_DS))
+			if (!guest_cpu_cap_has(vcpu, X86_FEATURE_DS))
 				return 1;
-			if (!guest_cpuid_has(vcpu, X86_FEATURE_DTES64))
+			if (!guest_cpu_cap_has(vcpu, X86_FEATURE_DTES64))
 				return 1;
 			if (!cpuid_model_is_consistent(vcpu))
 				return 1;
@@ -4570,10 +4570,7 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
 	bool __enabled;										\
 												\
 	if (cpu_has_vmx_##name()) {								\
-		if (kvm_is_governed_feature(X86_FEATURE_##feat_name))				\
-			__enabled = guest_cpu_cap_has(__vcpu, X86_FEATURE_##feat_name);		\
-		else										\
-			__enabled = guest_cpuid_has(__vcpu, X86_FEATURE_##feat_name);		\
+		__enabled = guest_cpu_cap_has(__vcpu, X86_FEATURE_##feat_name);			\
 		vmx_adjust_secondary_exec_control(vmx, exec_control, SECONDARY_EXEC_##ctrl_name,\
 						  __enabled, exiting);				\
 	}											\
@@ -4649,8 +4646,8 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
 	 */
 	if (cpu_has_vmx_rdtscp()) {
 		bool rdpid_or_rdtscp_enabled =
-			guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) ||
-			guest_cpuid_has(vcpu, X86_FEATURE_RDPID);
+			guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) ||
+			guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID);
 
 		vmx_adjust_secondary_exec_control(vmx, &exec_control,
 						  SECONDARY_EXEC_ENABLE_RDTSCP,
@@ -5956,7 +5953,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
 	} operand;
 	int gpr_index;
 
-	if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_INVPCID)) {
 		kvm_queue_exception(vcpu, UD_VECTOR);
 		return 1;
 	}
@@ -7837,7 +7834,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 * set if and only if XSAVE is supported.
 	 */
 	if (!boot_cpu_has(X86_FEATURE_XSAVE) ||
-	    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
+	    !guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE))
 		guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES);
 
 	vmx_setup_uret_msrs(vmx);
@@ -7859,21 +7856,21 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 		nested_vmx_cr_fixed1_bits_update(vcpu);
 
 	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
-			guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
+			guest_cpu_cap_has(vcpu, X86_FEATURE_INTEL_PT))
 		update_intel_pt_cfg(vcpu);
 
 	if (boot_cpu_has(X86_FEATURE_RTM)) {
 		struct vmx_uret_msr *msr;
 		msr = vmx_find_uret_msr(vmx, MSR_IA32_TSX_CTRL);
 		if (msr) {
-			bool enabled = guest_cpuid_has(vcpu, X86_FEATURE_RTM);
+			bool enabled = guest_cpu_cap_has(vcpu, X86_FEATURE_RTM);
 			vmx_set_guest_uret_msr(vmx, msr, enabled ? 0 : TSX_CTRL_RTM_DISABLE);
 		}
 	}
 
 	if (kvm_cpu_cap_has(X86_FEATURE_XFD))
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
-					  !guest_cpuid_has(vcpu, X86_FEATURE_XFD));
+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD));
 
 	if (boot_cpu_has(X86_FEATURE_IBPB))
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
@@ -7881,17 +7878,17 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 
 	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
 		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
-					  !guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
+					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
 
 	set_cr4_guest_host_mask(vmx);
 
 	vmx_write_encls_bitmap(vcpu, NULL);
-	if (guest_cpuid_has(vcpu, X86_FEATURE_SGX))
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX))
 		vmx->msr_ia32_feature_control_valid_bits |= FEAT_CTL_SGX_ENABLED;
 	else
 		vmx->msr_ia32_feature_control_valid_bits &= ~FEAT_CTL_SGX_ENABLED;
 
-	if (guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC))
+	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC))
 		vmx->msr_ia32_feature_control_valid_bits |=
 			FEAT_CTL_SGX_LC_ENABLED;
 	else
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4ca9651b3f43..5aa7581802f7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -488,7 +488,7 @@ int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	enum lapic_mode old_mode = kvm_get_apic_mode(vcpu);
 	enum lapic_mode new_mode = kvm_apic_mode(msr_info->data);
 	u64 reserved_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu) | 0x2ff |
-		(guest_cpuid_has(vcpu, X86_FEATURE_X2APIC) ? 0 : X2APIC_ENABLE);
+		(guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) ? 0 : X2APIC_ENABLE);
 
 	if ((msr_info->data & reserved_bits) != 0 || new_mode == LAPIC_MODE_INVALID)
 		return 1;
@@ -1351,10 +1351,10 @@ static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu)
 {
 	u64 fixed = DR6_FIXED_1;
 
-	if (!guest_cpuid_has(vcpu, X86_FEATURE_RTM))
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_RTM))
 		fixed |= DR6_RTM;
 
-	if (!guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
 		fixed |= DR6_BUS_LOCK;
 	return fixed;
 }
@@ -1708,20 +1708,20 @@ static int do_get_msr_feature(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
 
 static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
-	if (efer & EFER_AUTOIBRS && !guest_cpuid_has(vcpu, X86_FEATURE_AUTOIBRS))
+	if (efer & EFER_AUTOIBRS && !guest_cpu_cap_has(vcpu, X86_FEATURE_AUTOIBRS))
 		return false;
 
-	if (efer & EFER_FFXSR && !guest_cpuid_has(vcpu, X86_FEATURE_FXSR_OPT))
+	if (efer & EFER_FFXSR && !guest_cpu_cap_has(vcpu, X86_FEATURE_FXSR_OPT))
 		return false;
 
-	if (efer & EFER_SVME && !guest_cpuid_has(vcpu, X86_FEATURE_SVM))
+	if (efer & EFER_SVME && !guest_cpu_cap_has(vcpu, X86_FEATURE_SVM))
 		return false;
 
 	if (efer & (EFER_LME | EFER_LMA) &&
-	    !guest_cpuid_has(vcpu, X86_FEATURE_LM))
+	    !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
 		return false;
 
-	if (efer & EFER_NX && !guest_cpuid_has(vcpu, X86_FEATURE_NX))
+	if (efer & EFER_NX && !guest_cpu_cap_has(vcpu, X86_FEATURE_NX))
 		return false;
 
 	return true;
@@ -1863,8 +1863,8 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
 			return 1;
 
 		if (!host_initiated &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
 			return 1;
 
 		/*
@@ -1920,8 +1920,8 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
 			return 1;
 
 		if (!host_initiated &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
 			return 1;
 		break;
 	}
@@ -2113,7 +2113,7 @@ EXPORT_SYMBOL_GPL(kvm_handle_invalid_op);
 static int kvm_emulate_monitor_mwait(struct kvm_vcpu *vcpu, const char *insn)
 {
 	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS) &&
-	    !guest_cpuid_has(vcpu, X86_FEATURE_MWAIT))
+	    !guest_cpu_cap_has(vcpu, X86_FEATURE_MWAIT))
 		return kvm_handle_invalid_op(vcpu);
 
 	pr_warn_once("%s instruction emulated as NOP!\n", insn);
@@ -3820,11 +3820,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			if ((!guest_has_pred_cmd_msr(vcpu)))
 				return 1;
 
-			if (!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) &&
-			    !guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBPB))
+			if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) &&
+			    !guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBPB))
 				reserved_bits |= PRED_CMD_IBPB;
 
-			if (!guest_cpuid_has(vcpu, X86_FEATURE_SBPB))
+			if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SBPB))
 				reserved_bits |= PRED_CMD_SBPB;
 		}
 
@@ -3845,7 +3845,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	}
 	case MSR_IA32_FLUSH_CMD:
 		if (!msr_info->host_initiated &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D))
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D))
 			return 1;
 
 		if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D) || (data & ~L1D_FLUSH))
@@ -3896,7 +3896,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		kvm_set_lapic_tscdeadline_msr(vcpu, data);
 		break;
 	case MSR_IA32_TSC_ADJUST:
-		if (guest_cpuid_has(vcpu, X86_FEATURE_TSC_ADJUST)) {
+		if (guest_cpu_cap_has(vcpu, X86_FEATURE_TSC_ADJUST)) {
 			if (!msr_info->host_initiated) {
 				s64 adj = data - vcpu->arch.ia32_tsc_adjust_msr;
 				adjust_tsc_offset_guest(vcpu, adj);
@@ -3923,7 +3923,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 
 		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
 		    ((old_val ^ data)  & MSR_IA32_MISC_ENABLE_MWAIT)) {
-			if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
+			if (!guest_cpu_cap_has(vcpu, X86_FEATURE_XMM3))
 				return 1;
 			vcpu->arch.ia32_misc_enable_msr = data;
 			kvm_update_cpuid_runtime(vcpu);
@@ -4100,12 +4100,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		kvm_pr_unimpl_wrmsr(vcpu, msr, data);
 		break;
 	case MSR_AMD64_OSVW_ID_LENGTH:
-		if (!guest_cpuid_has(vcpu, X86_FEATURE_OSVW))
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
 			return 1;
 		vcpu->arch.osvw.length = data;
 		break;
 	case MSR_AMD64_OSVW_STATUS:
-		if (!guest_cpuid_has(vcpu, X86_FEATURE_OSVW))
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
 			return 1;
 		vcpu->arch.osvw.status = data;
 		break;
@@ -4126,7 +4126,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 #ifdef CONFIG_X86_64
 	case MSR_IA32_XFD:
 		if (!msr_info->host_initiated &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_XFD))
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
 			return 1;
 
 		if (data & ~kvm_guest_supported_xfd(vcpu))
@@ -4136,7 +4136,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_XFD_ERR:
 		if (!msr_info->host_initiated &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_XFD))
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
 			return 1;
 
 		if (data & ~kvm_guest_supported_xfd(vcpu))
@@ -4260,13 +4260,13 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_ARCH_CAPABILITIES:
 		if (!msr_info->host_initiated &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
 			return 1;
 		msr_info->data = vcpu->arch.arch_capabilities;
 		break;
 	case MSR_IA32_PERF_CAPABILITIES:
 		if (!msr_info->host_initiated &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
 			return 1;
 		msr_info->data = vcpu->arch.perf_capabilities;
 		break;
@@ -4467,12 +4467,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = 0xbe702111;
 		break;
 	case MSR_AMD64_OSVW_ID_LENGTH:
-		if (!guest_cpuid_has(vcpu, X86_FEATURE_OSVW))
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
 			return 1;
 		msr_info->data = vcpu->arch.osvw.length;
 		break;
 	case MSR_AMD64_OSVW_STATUS:
-		if (!guest_cpuid_has(vcpu, X86_FEATURE_OSVW))
+		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
 			return 1;
 		msr_info->data = vcpu->arch.osvw.status;
 		break;
@@ -4491,14 +4491,14 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 #ifdef CONFIG_X86_64
 	case MSR_IA32_XFD:
 		if (!msr_info->host_initiated &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_XFD))
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
 			return 1;
 
 		msr_info->data = vcpu->arch.guest_fpu.fpstate->xfd;
 		break;
 	case MSR_IA32_XFD_ERR:
 		if (!msr_info->host_initiated &&
-		    !guest_cpuid_has(vcpu, X86_FEATURE_XFD))
+		    !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
 			return 1;
 
 		msr_info->data = vcpu->arch.guest_fpu.xfd_err;
@@ -8508,17 +8508,17 @@ static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt,
 
 static bool emulator_guest_has_movbe(struct x86_emulate_ctxt *ctxt)
 {
-	return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_MOVBE);
+	return guest_cpu_cap_has(emul_to_vcpu(ctxt), X86_FEATURE_MOVBE);
 }
 
 static bool emulator_guest_has_fxsr(struct x86_emulate_ctxt *ctxt)
 {
-	return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_FXSR);
+	return guest_cpu_cap_has(emul_to_vcpu(ctxt), X86_FEATURE_FXSR);
 }
 
 static bool emulator_guest_has_rdpid(struct x86_emulate_ctxt *ctxt)
 {
-	return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_RDPID);
+	return guest_cpu_cap_has(emul_to_vcpu(ctxt), X86_FEATURE_RDPID);
 }
 
 static ulong emulator_read_gpr(struct x86_emulate_ctxt *ctxt, unsigned reg)
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 47/49] KVM: x86: Drop superfluous host XSAVE check when adjusting guest XSAVES caps
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (45 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 46/49] KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_caps Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  2:36   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 48/49] KVM: x86: Add a macro for features that are synthesized into boot_cpu_data Sean Christopherson
                   ` (2 subsequent siblings)
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Drop the manual boot_cpu_has() checks on XSAVE when adjusting the guest's
XSAVES capabilities now that guest cpu_caps incorporates KVM's support.
The guest's cpu_caps are initialized from kvm_cpu_caps, which are in turn
initialized from boot_cpu_data, i.e. checking guest_cpu_cap_has() also
checks host/KVM capabilities (which is the entire point of cpu_caps).

Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/svm.c | 1 -
 arch/x86/kvm/vmx/vmx.c | 3 +--
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 06770b60c0ba..4aaffbf22531 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4340,7 +4340,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 * the guest read/write access to the host's XSS.
 	 */
 	guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES,
-			     boot_cpu_has(X86_FEATURE_XSAVE) &&
 			     boot_cpu_has(X86_FEATURE_XSAVES) &&
 			     guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE));
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 741961a1edcc..6fbdf520c58b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7833,8 +7833,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	 * to the guest.  XSAVES depends on CR4.OSXSAVE, and CR4.OSXSAVE can be
 	 * set if and only if XSAVE is supported.
 	 */
-	if (!boot_cpu_has(X86_FEATURE_XSAVE) ||
-	    !guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE))
+	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE))
 		guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES);
 
 	vmx_setup_uret_msrs(vmx);
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 48/49] KVM: x86: Add a macro for features that are synthesized into boot_cpu_data
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (46 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 47/49] KVM: x86: Drop superfluous host XSAVE check when adjusting guest XSAVES caps Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-07-05  2:43   ` Maxim Levitsky
  2024-05-17 17:39 ` [PATCH v2 49/49] *** DO NOT APPLY *** KVM: x86: Verify KVM initializes all consumed guest caps Sean Christopherson
  2024-05-17 17:54 ` [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Paolo Bonzini
  49 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Add yet another CPUID macro, this time for features that the host kernel
synthesizes into boot_cpu_data, i.e. that the kernel force sets even in
situations where the feature isn't reported by CPUID.  Thanks to the
macro shenanigans of kvm_cpu_cap_init(), such features can now be handled
in the core CPUID framework, i.e. don't need to be handled out-of-band and
thus without as many guardrails.

Adding a dedicated macro also helps document what's going on, e.g. the
calls to kvm_cpu_cap_check_and_set() are very confusing unless the reader
knows exactly how kvm_cpu_cap_init() generates kvm_cpu_caps (and even
then, it's far from obvious).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0130e0677387..0e64a6332052 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -106,6 +106,17 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
 	F(name);						\
 })
 
+/*
+ * Synthesized Feature - For features that are synthesized into boot_cpu_data,
+ * i.e. may not be present in the raw CPUID, but can still be advertised to
+ * userspace.  Primarily used for mitigation related feature flags.
+ */
+#define SYN_F(name)						\
+({								\
+	kvm_cpu_cap_synthesized |= F(name);			\
+	F(name);						\
+})
+
 /*
  * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
  * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
@@ -727,13 +738,15 @@ do {									\
 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
 	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
 	u32 kvm_cpu_cap_emulated = 0;					\
+	u32 kvm_cpu_cap_synthesized = 0;				\
 									\
 	if (leaf < NCAPINTS)						\
 		kvm_cpu_caps[leaf] &= (mask);				\
 	else								\
 		kvm_cpu_caps[leaf] = (mask);				\
 									\
-	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);			\
+	kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |			\
+			       kvm_cpu_cap_synthesized);		\
 	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
 } while (0)
 
@@ -913,13 +926,10 @@ void kvm_set_cpu_caps(void)
 	kvm_cpu_cap_init(CPUID_8000_0021_EAX,
 		F(NO_NESTED_DATA_BP) | F(LFENCE_RDTSC) | 0 /* SmmPgCfgLock */ |
 		F(NULL_SEL_CLR_BASE) | F(AUTOIBRS) | 0 /* PrefetchCtlMsr */ |
-		F(WRMSR_XX_BASE_NS)
+		F(WRMSR_XX_BASE_NS) | SYN_F(SBPB) | SYN_F(IBPB_BRTYPE) |
+		SYN_F(SRSO_NO)
 	);
 
-	kvm_cpu_cap_check_and_set(X86_FEATURE_SBPB);
-	kvm_cpu_cap_check_and_set(X86_FEATURE_IBPB_BRTYPE);
-	kvm_cpu_cap_check_and_set(X86_FEATURE_SRSO_NO);
-
 	kvm_cpu_cap_init(CPUID_8000_0022_EAX,
 		F(PERFMON_V2)
 	);
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* [PATCH v2 49/49] *** DO NOT APPLY *** KVM: x86: Verify KVM initializes all consumed guest caps
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (47 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 48/49] KVM: x86: Add a macro for features that are synthesized into boot_cpu_data Sean Christopherson
@ 2024-05-17 17:39 ` Sean Christopherson
  2024-05-17 17:54 ` [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Paolo Bonzini
  49 siblings, 0 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-05-17 17:39 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

Assert that all features queried via guest_cpu_cap_has() are known to KVM,
i.e. that KVM doesn't check for a feature that can never actually be set.

This is for demonstration purposes only, as the proper way to enforce this
is to do post-processing at build time (and there are other shortcomings
of this PoC, e.g. it requires all KVM modules to be built-in).

Not-signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/cpuid.c              | 81 +++++++++++++++++++++++--------
 arch/x86/kvm/cpuid.h              | 16 +++++-
 arch/x86/kvm/x86.c                |  2 +
 include/asm-generic/vmlinux.lds.h |  4 ++
 4 files changed, 81 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0e64a6332052..18ded0e682f2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -37,6 +37,7 @@ u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
 EXPORT_SYMBOL_GPL(kvm_cpu_caps);
 
 static u32 kvm_vmm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
+static u32 kvm_known_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
 
 u32 xstate_required_size(u64 xstate_bv, bool compacted)
 {
@@ -143,6 +144,26 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
 	0;									\
 })
 
+/*
+ * Vendor Features - For features that KVM supports, but are added in later
+ * because they require additional vendor enabling.
+ */
+#define VEND_F(name)						\
+({								\
+	KVM_VALIDATE_CPU_CAP_USAGE(name);			\
+	0;							\
+})
+
+/*
+ * Operating System Features - For features that KVM dynamically sets/clears at
+ * runtime, e.g. when CR4 changes, but are never advertised to userspace.
+ */
+#define OS_F(name)						\
+({								\
+	KVM_VALIDATE_CPU_CAP_USAGE(name);			\
+	0;							\
+})
+
 /*
  * Magic value used by KVM when querying userspace-provided CPUID entries and
  * doesn't care about the CPIUD index because the index of the function in
@@ -727,6 +748,7 @@ do {									\
 	u32 __leaf = __feature_leaf(X86_FEATURE_##name);		\
 									\
 	BUILD_BUG_ON(__leaf != kvm_cpu_cap_init_in_progress);		\
+	kvm_known_cpu_caps[__leaf] |= feature_bit(name);		\
 } while (0)
 
 /*
@@ -771,14 +793,14 @@ void kvm_set_cpu_caps(void)
 		 * NOTE: MONITOR (and MWAIT) are emulated as NOP, but *not*
 		 * advertised to guests via CPUID!
 		 */
-		F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64 */ | VMM_F(MWAIT) |
-		0 /* DS-CPL, VMX, SMX, EST */ |
+		F(XMM3) | F(PCLMULQDQ) | VEND_F(DTES64) | VMM_F(MWAIT) |
+		VEND_F(VMX) | 0 /* DS-CPL, SMX, EST */ |
 		0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
 		F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
 		F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
 		F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
 		EMUL_F(TSC_DEADLINE_TIMER) | F(AES) | F(XSAVE) |
-		0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND) |
+		OS_F(OSXSAVE) | F(AVX) | F(F16C) | F(RDRAND) |
 		EMUL_F(HYPERVISOR)
 	);
 
@@ -788,7 +810,7 @@ void kvm_set_cpu_caps(void)
 		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SEP) |
 		F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
 		F(PAT) | F(PSE36) | 0 /* PSN */ | F(CLFLUSH) |
-		0 /* Reserved, DS, ACPI */ | F(MMX) |
+		0 /* Reserved */ | F(DS) | 0 /* ACPI */ | F(MMX) |
 		F(FXSR) | F(XMM) | F(XMM2) | F(SELFSNOOP) |
 		0 /* HTT, TM, Reserved, PBE */
 	);
@@ -796,17 +818,17 @@ void kvm_set_cpu_caps(void)
 	kvm_cpu_cap_init(CPUID_7_0_EBX,
 		F(FSGSBASE) | EMUL_F(TSC_ADJUST) | F(SGX) | F(BMI1) | F(HLE) |
 		F(AVX2) | F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) |
-		F(INVPCID) | F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ |
+		F(INVPCID) | F(RTM) | F(ZERO_FCS_FDS) | VEND_F(MPX) |
 		F(AVX512F) | F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) |
-		F(AVX512IFMA) | F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ |
+		F(AVX512IFMA) | F(CLFLUSHOPT) | F(CLWB) | VEND_F(INTEL_PT) |
 		F(AVX512PF) | F(AVX512ER) | F(AVX512CD) | F(SHA_NI) |
 		F(AVX512BW) | F(AVX512VL));
 
 	kvm_cpu_cap_init(CPUID_7_ECX,
-		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
+		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | OS_F(OSPKE) | F(RDPID) |
 		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
 		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
-		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
+		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | VEND_F(WAITPKG) |
 		F(SGX_LC) | F(BUS_LOCK_DETECT)
 	);
 
@@ -858,11 +880,11 @@ void kvm_set_cpu_caps(void)
 	);
 
 	kvm_cpu_cap_init(CPUID_8000_0001_ECX,
-		F(LAHF_LM) | F(CMP_LEGACY) | 0 /*SVM*/ | 0 /* ExtApicSpace */ |
+		F(LAHF_LM) | F(CMP_LEGACY) | VEND_F(SVM) | 0 /* ExtApicSpace */ |
 		F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
 		F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
 		0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM) |
-		F(TOPOEXT) | 0 /* PERFCTR_CORE */
+		F(TOPOEXT) | VEND_F(PERFCTR_CORE)
 	);
 
 	kvm_cpu_cap_init(CPUID_8000_0001_EDX,
@@ -905,23 +927,22 @@ void kvm_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_AMD_SSBD);
 	if (!boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS))
 		kvm_cpu_cap_set(X86_FEATURE_AMD_SSB_NO);
-	/*
-	 * The preference is to use SPEC CTRL MSR instead of the
-	 * VIRT_SPEC MSR.
-	 */
-	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) &&
-	    !boot_cpu_has(X86_FEATURE_AMD_SSBD))
-		kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
 
 	/*
 	 * Hide all SVM features by default, SVM will set the cap bits for
 	 * features it emulates and/or exposes for L1.
 	 */
-	kvm_cpu_cap_init(CPUID_8000_000A_EDX, 0);
+	kvm_cpu_cap_init(CPUID_8000_000A_EDX,
+		VEND_F(VMCBCLEAN) | VEND_F(FLUSHBYASID) | VEND_F(NRIPS) |
+		VEND_F(TSCRATEMSR) | VEND_F(V_VMSAVE_VMLOAD) | VEND_F(LBRV) |
+		VEND_F(PAUSEFILTER) | VEND_F(PFTHRESHOLD) | VEND_F(VGIF) |
+		VEND_F(VNMI) | VEND_F(SVME_ADDR_CHK)
+	);
 
 	kvm_cpu_cap_init(CPUID_8000_001F_EAX,
-		0 /* SME */ | 0 /* SEV */ | 0 /* VM_PAGE_FLUSH */ | 0 /* SEV_ES */ |
-		F(SME_COHERENT));
+		VEND_F(SME) | VEND_F(SEV) | 0 /* VM_PAGE_FLUSH */ | VEND_F(SEV_ES) |
+		F(SME_COHERENT)
+	);
 
 	kvm_cpu_cap_init(CPUID_8000_0021_EAX,
 		F(NO_NESTED_DATA_BP) | F(LFENCE_RDTSC) | 0 /* SmmPgCfgLock */ |
@@ -977,6 +998,26 @@ EXPORT_SYMBOL_GPL(kvm_set_cpu_caps);
 #undef KVM_VALIDATE_CPU_CAP_USAGE
 #define KVM_VALIDATE_CPU_CAP_USAGE(name)
 
+
+extern unsigned int __start___kvm_features[];
+extern unsigned int __stop___kvm_features[];
+
+void kvm_validate_cpu_caps(void)
+{
+	int i;
+
+	for (i = 0; i < __stop___kvm_features - __start___kvm_features; i++) {
+		u32 feature = __feature_translate(__start___kvm_features[i]);
+		u32 leaf = feature / 32;
+
+		if (kvm_known_cpu_caps[leaf] & BIT(feature & 31))
+			continue;
+
+		pr_warn("Word %u, bit %u (%lx) checked but not supported\n",
+			leaf, feature & 31, BIT(feature & 31));
+	}
+
+}
 struct kvm_cpuid_array {
 	struct kvm_cpuid_entry2 *entries;
 	int maxnent;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 0bf3bddd0e29..32a86de980c7 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -10,6 +10,7 @@
 
 extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
 void kvm_set_cpu_caps(void);
+void kvm_validate_cpu_caps(void);
 
 void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
 void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
@@ -245,8 +246,8 @@ static __always_inline void guest_cpu_cap_change(struct kvm_vcpu *vcpu,
 		guest_cpu_cap_clear(vcpu, x86_feature);
 }
 
-static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
-					      unsigned int x86_feature)
+static __always_inline bool __guest_cpu_cap_has(struct kvm_vcpu *vcpu,
+					        unsigned int x86_feature)
 {
 	unsigned int x86_leaf = __feature_leaf(x86_feature);
 
@@ -254,6 +255,17 @@ static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
 	return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature);
 }
 
+#define guest_cpu_cap_has(vcpu, x86_feature)			\
+({								\
+	asm volatile(						\
+		" .pushsection \"__kvm_features\",\"a\"\n"	\
+		" .balign 4\n"					\
+		" .long " __stringify(x86_feature) " \n"	\
+		" .popsection\n"				\
+	);							\
+	__guest_cpu_cap_has(vcpu, x86_feature);			\
+})
+
 static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 {
 	if (guest_cpu_cap_has(vcpu, X86_FEATURE_LAM))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5aa7581802f7..f6b7c5c862fb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9790,6 +9790,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 	if (r != 0)
 		goto out_mmu_exit;
 
+	kvm_validate_cpu_caps();
+
 	kvm_ops_update(ops);
 
 	for_each_online_cpu(cpu) {
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index f7749d0f2562..102fc2a39083 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -533,6 +533,10 @@
 		BOUNDED_SECTION_BY(__modver, ___modver)			\
 	}								\
 									\
+	__kvm_features : AT(ADDR(__kvm_features) - LOAD_OFFSET) {	\
+		BOUNDED_SECTION_BY(__kvm_features, ___kvm_features)	\
+	}								\
+									\
 	KCFI_TRAPS							\
 									\
 	RO_EXCEPTION_TABLE						\
-- 
2.45.0.215.g3402c0e53f-goog


^ permalink raw reply related	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching
  2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
                   ` (48 preceding siblings ...)
  2024-05-17 17:39 ` [PATCH v2 49/49] *** DO NOT APPLY *** KVM: x86: Verify KVM initializes all consumed guest caps Sean Christopherson
@ 2024-05-17 17:54 ` Paolo Bonzini
  49 siblings, 0 replies; 185+ messages in thread
From: Paolo Bonzini @ 2024-05-17 17:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong, Kechen Lu,
	Oliver Upton, Maxim Levitsky, Binbin Wu, Yang Weijiang,
	Robert Hoo

On Fri, May 17, 2024 at 7:39 PM Sean Christopherson <seanjc@google.com> wrote:
>  * Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation
>  * Reject disabling of MWAIT/HLT interception when not allowed
>  * Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID.

This is technically a breaking change, and it's even documented in
api.rst under "KVM_GET_SUPPORTED_CPUID issues":

---
CPU[EAX=1]:ECX[21] (X2APIC) is reported by
``KVM_GET_SUPPORTED_CPUID``, but it can only be enabled if
``KVM_CREATE_IRQCHIP`` or ``KVM_ENABLE_CAP(KVM_CAP_IRQCHIP_SPLIT)``
are used to enable in-kernel emulation of the local APIC.

The same is true for the ``KVM_FEATURE_PV_UNHALT`` paravirtualized feature.

CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by
``KVM_GET_SUPPORTED_CPUID``. It can be enabled if
``KVM_CAP_TSC_DEADLINE_TIMER`` is present and the kernel has enabled
in-kernel emulation of the local APIC.
---

However I think we can get away with it. QEMU source code on one hand does

        /* tsc-deadline flag is not returned by GET_SUPPORTED_CPUID, but it
         * can be enabled if the kernel has KVM_CAP_TSC_DEADLINE_TIMER,
         * and the irqchip is in the kernel.
         */
        if (kvm_irqchip_in_kernel() &&
                kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
            ret |= CPUID_EXT_TSC_DEADLINE_TIMER;
        }

        /* x2apic is reported by GET_SUPPORTED_CPUID, but it can't be enabled
         * without the in-kernel irqchip
         */
        if (!kvm_irqchip_in_kernel()) {
            ret &= ~CPUID_EXT_X2APIC;
        }

so it has to cope with existing mess but it's not expecting the
opposite mess (understandable).

However, in practice userspace APIC has always been utterly broken and
even deprecated in QEMU, so we might get away with it. I don't see why
one would use no kernel APIC unless the guest has no APIC whatsoever.

And no guest that doesn't find an APIC is going to use the TSC
deadline timer (sure the MSR is outside x2APIC space but how in the
world would you configure LVTT), likewise for X2APIC since you need to
turn it on at 0xFEE0_0000 first.

Paolo


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed
  2024-05-17 17:38 ` [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed Sean Christopherson
@ 2024-05-22  5:09   ` Binbin Wu
  2024-05-28 18:56     ` Sean Christopherson
  2024-07-05  1:17   ` Maxim Levitsky
  2024-07-12  7:51   ` Xiaoyao Li
  2 siblings, 1 reply; 185+ messages in thread
From: Binbin Wu @ 2024-05-22  5:09 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Yang Weijiang, Robert Hoo



On 5/18/2024 1:38 AM, Sean Christopherson wrote:
> Reject KVM_CAP_X86_DISABLE_EXITS if userspace attempts to disable MWAIT or
> HLT exits and KVM previously reported (via KVM_CHECK_EXTENSION) that
> disabling the exit(s) is not allowed.  E.g. because MWAIT isn't supported
> or the CPU doesn't have an aways-running APIC timer, or because KVM is

aways-running -> always-running

> configured to mitigate cross-thread vulnerabilities.
>
> Cc: Kechen Lu <kechenl@nvidia.com>
> Fixes: 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT intercepts")
> Fixes: 6f0f2d5ef895 ("KVM: x86: Mitigate the cross-thread return address predictions bug")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/x86.c | 54 ++++++++++++++++++++++++----------------------
>   1 file changed, 28 insertions(+), 26 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4cb0c150a2f8..c729227c6501 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4590,6 +4590,20 @@ static inline bool kvm_can_mwait_in_guest(void)
>   		boot_cpu_has(X86_FEATURE_ARAT);
>   }
>   
> +static u64 kvm_get_allowed_disable_exits(void)
> +{
> +	u64 r = KVM_X86_DISABLE_EXITS_PAUSE;
> +
> +	if (!mitigate_smt_rsb) {
> +		r |= KVM_X86_DISABLE_EXITS_HLT |
> +			KVM_X86_DISABLE_EXITS_CSTATE;
> +
> +		if (kvm_can_mwait_in_guest())
> +			r |= KVM_X86_DISABLE_EXITS_MWAIT;
> +	}
> +	return r;
> +}
> +
>   #ifdef CONFIG_KVM_HYPERV
>   static int kvm_ioctl_get_supported_hv_cpuid(struct kvm_vcpu *vcpu,
>   					    struct kvm_cpuid2 __user *cpuid_arg)
> @@ -4726,15 +4740,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   		r = KVM_CLOCK_VALID_FLAGS;
>   		break;
>   	case KVM_CAP_X86_DISABLE_EXITS:
> -		r = KVM_X86_DISABLE_EXITS_PAUSE;
> -
> -		if (!mitigate_smt_rsb) {
> -			r |= KVM_X86_DISABLE_EXITS_HLT |
> -			     KVM_X86_DISABLE_EXITS_CSTATE;
> -
> -			if (kvm_can_mwait_in_guest())
> -				r |= KVM_X86_DISABLE_EXITS_MWAIT;
> -		}
> +		r |= kvm_get_allowed_disable_exits();

Nit: Just use "=".

>   		break;
>   	case KVM_CAP_X86_SMM:
>   		if (!IS_ENABLED(CONFIG_KVM_SMM))
> @@ -6565,33 +6571,29 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   		break;
>   	case KVM_CAP_X86_DISABLE_EXITS:
>   		r = -EINVAL;
> -		if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
> +		if (cap->args[0] & ~kvm_get_allowed_disable_exits())
>   			break;
>   
>   		mutex_lock(&kvm->lock);
>   		if (kvm->created_vcpus)
>   			goto disable_exits_unlock;
>   
> -		if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
> -			kvm->arch.pause_in_guest = true;
> -
>   #define SMT_RSB_MSG "This processor is affected by the Cross-Thread Return Predictions vulnerability. " \
>   		    "KVM_CAP_X86_DISABLE_EXITS should only be used with SMT disabled or trusted guests."
>   
> -		if (!mitigate_smt_rsb) {
> -			if (boot_cpu_has_bug(X86_BUG_SMT_RSB) && cpu_smt_possible() &&
> -			    (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE))
> -				pr_warn_once(SMT_RSB_MSG);
> -
> -			if ((cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT) &&
> -			    kvm_can_mwait_in_guest())
> -				kvm->arch.mwait_in_guest = true;
> -			if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
> -				kvm->arch.hlt_in_guest = true;
> -			if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
> -				kvm->arch.cstate_in_guest = true;
> -		}
> +		if (!mitigate_smt_rsb && boot_cpu_has_bug(X86_BUG_SMT_RSB) &&
> +		    cpu_smt_possible() &&
> +		    (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE))
> +			pr_warn_once(SMT_RSB_MSG);
>   
> +		if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
> +			kvm->arch.pause_in_guest = true;
> +		if (cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT)
> +			kvm->arch.mwait_in_guest = true;
> +		if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
> +			kvm->arch.hlt_in_guest = true;
> +		if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
> +			kvm->arch.cstate_in_guest = true;
>   		r = 0;
>   disable_exits_unlock:
>   		mutex_unlock(&kvm->lock);


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 20/49] KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init()
  2024-05-17 17:38 ` [PATCH v2 20/49] KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init() Sean Christopherson
@ 2024-05-22  6:23   ` Binbin Wu
  2024-05-28 18:54     ` Sean Christopherson
  2024-07-05  1:24   ` Maxim Levitsky
  1 sibling, 1 reply; 185+ messages in thread
From: Binbin Wu @ 2024-05-22  6:23 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Yang Weijiang, Robert Hoo



On 5/18/2024 1:38 AM, Sean Christopherson wrote:
> Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init() in anticipation of merging
> it with kvm_cpu_cap_init_kvm_defined(), and in anticipation of _setting_
> bits in the helper (a future commit will play macro games to set emulated
> feature flags via kvm_cpu_cap_init()).
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/cpuid.c | 36 ++++++++++++++++++------------------
>   1 file changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index a802c09b50ab..5a4d6138c4f1 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -74,7 +74,7 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
>    * Raw Feature - For features that KVM supports based purely on raw host CPUID,
>    * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
>    * Simply force set the feature in KVM's capabilities, raw CPUID support will
> - * be factored in by kvm_cpu_cap_mask().
> + * be factored in by __kvm_cpu_cap_mask().

kvm_cpu_cap_init()?


>    */
>   #define RAW_F(name)						\
>   ({								\
> @@ -619,7 +619,7 @@ static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
>   static __always_inline
>   void kvm_cpu_cap_init_kvm_defined(enum kvm_only_cpuid_leafs leaf, u32 mask)
>   {
> -	/* Use kvm_cpu_cap_mask for leafs that aren't KVM-only. */
> +	/* Use kvm_cpu_cap_init for leafs that aren't KVM-only. */
>   	BUILD_BUG_ON(leaf < NCAPINTS);
>   
>   	kvm_cpu_caps[leaf] = mask;
> @@ -627,7 +627,7 @@ void kvm_cpu_cap_init_kvm_defined(enum kvm_only_cpuid_leafs leaf, u32 mask)
>   	__kvm_cpu_cap_mask(leaf);
>   }
>   
> -static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
> +static __always_inline void kvm_cpu_cap_init(enum cpuid_leafs leaf, u32 mask)
>   {
>   	/* Use kvm_cpu_cap_init_kvm_defined for KVM-only leafs. */
>   	BUILD_BUG_ON(leaf >= NCAPINTS);
> @@ -656,7 +656,7 @@ void kvm_set_cpu_caps(void)
>   	memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
>   	       sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)));
>   
> -	kvm_cpu_cap_mask(CPUID_1_ECX,
> +	kvm_cpu_cap_init(CPUID_1_ECX,
>   		/*
>   		 * NOTE: MONITOR (and MWAIT) are emulated as NOP, but *not*
>   		 * advertised to guests via CPUID!
> @@ -673,7 +673,7 @@ void kvm_set_cpu_caps(void)
>   	/* KVM emulates x2apic in software irrespective of host support. */
>   	kvm_cpu_cap_set(X86_FEATURE_X2APIC);
>   
> -	kvm_cpu_cap_mask(CPUID_1_EDX,
> +	kvm_cpu_cap_init(CPUID_1_EDX,
>   		F(FPU) | F(VME) | F(DE) | F(PSE) |
>   		F(TSC) | F(MSR) | F(PAE) | F(MCE) |
>   		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SEP) |
> @@ -684,7 +684,7 @@ void kvm_set_cpu_caps(void)
>   		0 /* HTT, TM, Reserved, PBE */
>   	);
>   
> -	kvm_cpu_cap_mask(CPUID_7_0_EBX,
> +	kvm_cpu_cap_init(CPUID_7_0_EBX,
>   		F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) |
>   		F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) | F(INVPCID) |
>   		F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ | F(AVX512F) |
> @@ -693,7 +693,7 @@ void kvm_set_cpu_caps(void)
>   		F(AVX512ER) | F(AVX512CD) | F(SHA_NI) | F(AVX512BW) |
>   		F(AVX512VL));
>   
> -	kvm_cpu_cap_mask(CPUID_7_ECX,
> +	kvm_cpu_cap_init(CPUID_7_ECX,
>   		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
>   		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
>   		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> @@ -708,7 +708,7 @@ void kvm_set_cpu_caps(void)
>   	if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
>   		kvm_cpu_cap_clear(X86_FEATURE_PKU);
>   
> -	kvm_cpu_cap_mask(CPUID_7_EDX,
> +	kvm_cpu_cap_init(CPUID_7_EDX,
>   		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
>   		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
>   		F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
> @@ -727,7 +727,7 @@ void kvm_set_cpu_caps(void)
>   	if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
>   		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);
>   
> -	kvm_cpu_cap_mask(CPUID_7_1_EAX,
> +	kvm_cpu_cap_init(CPUID_7_1_EAX,
>   		F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) |
>   		F(FZRM) | F(FSRS) | F(FSRC) |
>   		F(AMX_FP16) | F(AVX_IFMA) | F(LAM)
> @@ -743,7 +743,7 @@ void kvm_set_cpu_caps(void)
>   		F(BHI_CTRL) | F(MCDT_NO)
>   	);
>   
> -	kvm_cpu_cap_mask(CPUID_D_1_EAX,
> +	kvm_cpu_cap_init(CPUID_D_1_EAX,
>   		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) | f_xfd
>   	);
>   
> @@ -751,7 +751,7 @@ void kvm_set_cpu_caps(void)
>   		SF(SGX1) | SF(SGX2) | SF(SGX_EDECCSSA)
>   	);
>   
> -	kvm_cpu_cap_mask(CPUID_8000_0001_ECX,
> +	kvm_cpu_cap_init(CPUID_8000_0001_ECX,
>   		F(LAHF_LM) | F(CMP_LEGACY) | 0 /*SVM*/ | 0 /* ExtApicSpace */ |
>   		F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
>   		F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
> @@ -759,7 +759,7 @@ void kvm_set_cpu_caps(void)
>   		F(TOPOEXT) | 0 /* PERFCTR_CORE */
>   	);
>   
> -	kvm_cpu_cap_mask(CPUID_8000_0001_EDX,
> +	kvm_cpu_cap_init(CPUID_8000_0001_EDX,
>   		F(FPU) | F(VME) | F(DE) | F(PSE) |
>   		F(TSC) | F(MSR) | F(PAE) | F(MCE) |
>   		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
> @@ -777,7 +777,7 @@ void kvm_set_cpu_caps(void)
>   		SF(CONSTANT_TSC)
>   	);
>   
> -	kvm_cpu_cap_mask(CPUID_8000_0008_EBX,
> +	kvm_cpu_cap_init(CPUID_8000_0008_EBX,
>   		F(CLZERO) | F(XSAVEERPTR) |
>   		F(WBNOINVD) | F(AMD_IBPB) | F(AMD_IBRS) | F(AMD_SSBD) | F(VIRT_SSBD) |
>   		F(AMD_SSB_NO) | F(AMD_STIBP) | F(AMD_STIBP_ALWAYS_ON) |
> @@ -811,13 +811,13 @@ void kvm_set_cpu_caps(void)
>   	 * Hide all SVM features by default, SVM will set the cap bits for
>   	 * features it emulates and/or exposes for L1.
>   	 */
> -	kvm_cpu_cap_mask(CPUID_8000_000A_EDX, 0);
> +	kvm_cpu_cap_init(CPUID_8000_000A_EDX, 0);
>   
> -	kvm_cpu_cap_mask(CPUID_8000_001F_EAX,
> +	kvm_cpu_cap_init(CPUID_8000_001F_EAX,
>   		0 /* SME */ | 0 /* SEV */ | 0 /* VM_PAGE_FLUSH */ | 0 /* SEV_ES */ |
>   		F(SME_COHERENT));
>   
> -	kvm_cpu_cap_mask(CPUID_8000_0021_EAX,
> +	kvm_cpu_cap_init(CPUID_8000_0021_EAX,
>   		F(NO_NESTED_DATA_BP) | F(LFENCE_RDTSC) | 0 /* SmmPgCfgLock */ |
>   		F(NULL_SEL_CLR_BASE) | F(AUTOIBRS) | 0 /* PrefetchCtlMsr */ |
>   		F(WRMSR_XX_BASE_NS)
> @@ -837,7 +837,7 @@ void kvm_set_cpu_caps(void)
>   	 * kernel.  LFENCE_RDTSC was a Linux-defined synthetic feature long
>   	 * before AMD joined the bandwagon, e.g. LFENCE is serializing on most
>   	 * CPUs that support SSE2.  On CPUs that don't support AMD's leaf,
> -	 * kvm_cpu_cap_mask() will unfortunately drop the flag due to ANDing
> +	 * kvm_cpu_cap_init() will unfortunately drop the flag due to ANDing
>   	 * the mask with the raw host CPUID, and reporting support in AMD's
>   	 * leaf can make it easier for userspace to detect the feature.
>   	 */
> @@ -847,7 +847,7 @@ void kvm_set_cpu_caps(void)
>   		kvm_cpu_cap_set(X86_FEATURE_NULL_SEL_CLR_BASE);
>   	kvm_cpu_cap_set(X86_FEATURE_NO_SMM_CTL_MSR);
>   
> -	kvm_cpu_cap_mask(CPUID_C000_0001_EDX,
> +	kvm_cpu_cap_init(CPUID_C000_0001_EDX,
>   		F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
>   		F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
>   		F(PMM) | F(PMM_EN)


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 33/49] KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID
  2024-05-17 17:39 ` [PATCH v2 33/49] KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID Sean Christopherson
@ 2024-05-22  9:11   ` Binbin Wu
  2024-05-28 15:21     ` Sean Christopherson
  2024-07-05  2:04   ` Maxim Levitsky
  1 sibling, 1 reply; 185+ messages in thread
From: Binbin Wu @ 2024-05-22  9:11 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Yang Weijiang, Robert Hoo



On 5/18/2024 1:39 AM, Sean Christopherson wrote:
> Advertise TSC_DEADLINE_TIMER via KVM_GET_SUPPORTED_CPUID when it's
> supported in hardware,

But it's using EMUL_F(TSC_DEADLINE_TIMER) below?

>   as the odds of a VMM emulating the local APIC in
> userspace, not emulating the TSC deadline timer, _and_ reflecting
> KVM_GET_SUPPORTED_CPUID back into KVM_SET_CPUID2 are extremely low.
>
> KVM has _unconditionally_ advertised X2APIC via CPUID since commit
> 0d1de2d901f4 ("KVM: Always report x2apic as supported feature"), and it
> is completely impossible for userspace to emulate X2APIC as KVM doesn't
> support forwarding the MSR accesses to userspace.  I.e. KVM has relied on
> userspace VMMs to not misreport local APIC capabilities for nearly 13
> years.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   Documentation/virt/kvm/api.rst | 9 ++++++---
>   arch/x86/kvm/cpuid.c           | 4 ++--
>   2 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 884846282d06..cb744a646de6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1804,15 +1804,18 @@ emulate them efficiently. The fields in each entry are defined as follows:
>            the values returned by the cpuid instruction for
>            this function/index combination
>   
> -The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned
> -as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC
> -support.  Instead it is reported via::
> +x2APIC (CPUID leaf 1, ecx[21) and TSC deadline timer (CPUID leaf 1, ecx[24])
> +may be returned as true, but they depend on KVM_CREATE_IRQCHIP for in-kernel
> +emulation of the local APIC.  TSC deadline timer support is also reported via::
>   
>     ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)
>   
>   if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
>   feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
>   
> +Enabling x2APIC in KVM_SET_CPUID2 requires KVM_CREATE_IRQCHIP as KVM doesn't
> +support forwarding x2APIC MSR accesses to userspace, i.e. KVM does not support
> +emulating x2APIC in userspace.
>   
>   4.47 KVM_PPC_GET_PVINFO
>   -----------------------
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 699ce4261e9c..d1f427284ccc 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -680,8 +680,8 @@ void kvm_set_cpu_caps(void)
>   		F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
>   		F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
>   		F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
> -		0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
> -		F(F16C) | F(RDRAND)
> +		EMUL_F(TSC_DEADLINE_TIMER) | F(AES) | F(XSAVE) |
> +		0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND)
>   	);
>   
>   	kvm_cpu_cap_init(CPUID_1_EDX,


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 36/49] KVM: x86: Rename "governed features" helpers to use "guest_cpu_cap"
  2024-05-17 17:39 ` [PATCH v2 36/49] KVM: x86: Rename "governed features" helpers to use "guest_cpu_cap" Sean Christopherson
@ 2024-05-22 14:23   ` Binbin Wu
  0 siblings, 0 replies; 185+ messages in thread
From: Binbin Wu @ 2024-05-22 14:23 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Yang Weijiang, Robert Hoo



On 5/18/2024 1:39 AM, Sean Christopherson wrote:
> As the first step toward replacing KVM's so-called "governed features"
> framework with a more comprehensive, less poorly named implementation,
> replace the "kvm_governed_feature" function prefix with "guest_cpu_cap"
> and rename guest_can_use() to guest_cpu_cap_has().
>
> The "guest_cpu_cap" naming scheme mirrors that of "kvm_cpu_cap", and
> provides a more clear distinction between guest capabilities, which are
> KVM controlled (heh, or one might say "governed"), and guest CPUID, which
> with few exceptions is fully userspace controlled.
>
> Opportunistically rewrite the comment about XSS passthrough for SEV-ES
> guests to avoid referencing so many functions, as such comments are prone
> to becoming stale (case in point...).
>
> No functional change intended.

Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

>
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/cpuid.c      |  2 +-
>   arch/x86/kvm/cpuid.h      | 16 ++++++++--------
>   arch/x86/kvm/mmu.h        |  2 +-
>   arch/x86/kvm/mmu/mmu.c    |  4 ++--
>   arch/x86/kvm/svm/nested.c | 22 +++++++++++-----------
>   arch/x86/kvm/svm/sev.c    | 17 ++++++++---------
>   arch/x86/kvm/svm/svm.c    | 26 +++++++++++++-------------
>   arch/x86/kvm/svm/svm.h    |  4 ++--
>   arch/x86/kvm/vmx/nested.c |  6 +++---
>   arch/x86/kvm/vmx/vmx.c    | 16 ++++++++--------
>   arch/x86/kvm/x86.c        |  4 ++--
>   11 files changed, 59 insertions(+), 60 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 16bb873188d6..286abefc93d5 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -407,7 +407,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>   	allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) :
>   				      guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES);
>   	if (allow_gbpages)
> -		kvm_governed_feature_set(vcpu, X86_FEATURE_GBPAGES);
> +		guest_cpu_cap_set(vcpu, X86_FEATURE_GBPAGES);
>   
>   	best = kvm_find_cpuid_entry(vcpu, 1);
>   	if (best && apic) {
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index d68b7d879820..e021681f34ac 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -256,8 +256,8 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature)
>   	return kvm_governed_feature_index(x86_feature) >= 0;
>   }
>   
> -static __always_inline void kvm_governed_feature_set(struct kvm_vcpu *vcpu,
> -						     unsigned int x86_feature)
> +static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
> +					      unsigned int x86_feature)
>   {
>   	BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
>   
> @@ -265,15 +265,15 @@ static __always_inline void kvm_governed_feature_set(struct kvm_vcpu *vcpu,
>   		  vcpu->arch.governed_features.enabled);
>   }
>   
> -static __always_inline void kvm_governed_feature_check_and_set(struct kvm_vcpu *vcpu,
> -							       unsigned int x86_feature)
> +static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
> +							unsigned int x86_feature)
>   {
>   	if (kvm_cpu_cap_has(x86_feature) && guest_cpuid_has(vcpu, x86_feature))
> -		kvm_governed_feature_set(vcpu, x86_feature);
> +		guest_cpu_cap_set(vcpu, x86_feature);
>   }
>   
> -static __always_inline bool guest_can_use(struct kvm_vcpu *vcpu,
> -					  unsigned int x86_feature)
> +static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
> +					      unsigned int x86_feature)
>   {
>   	BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
>   
> @@ -283,7 +283,7 @@ static __always_inline bool guest_can_use(struct kvm_vcpu *vcpu,
>   
>   static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
>   {
> -	if (guest_can_use(vcpu, X86_FEATURE_LAM))
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_LAM))
>   		cr3 &= ~(X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
>   
>   	return kvm_vcpu_is_legal_gpa(vcpu, cr3);
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index dc80e72e4848..cf95ea5fe29d 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -150,7 +150,7 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu)
>   
>   static inline unsigned long kvm_get_active_cr3_lam_bits(struct kvm_vcpu *vcpu)
>   {
> -	if (!guest_can_use(vcpu, X86_FEATURE_LAM))
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_LAM))
>   		return 0;
>   
>   	return kvm_read_cr3(vcpu) & (X86_CR3_LAM_U48 | X86_CR3_LAM_U57);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 5095fb46713e..e18a10c59431 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4966,7 +4966,7 @@ static void reset_guest_rsvds_bits_mask(struct kvm_vcpu *vcpu,
>   	__reset_rsvds_bits_mask(&context->guest_rsvd_check,
>   				vcpu->arch.reserved_gpa_bits,
>   				context->cpu_role.base.level, is_efer_nx(context),
> -				guest_can_use(vcpu, X86_FEATURE_GBPAGES),
> +				guest_cpu_cap_has(vcpu, X86_FEATURE_GBPAGES),
>   				is_cr4_pse(context),
>   				guest_cpuid_is_amd_compatible(vcpu));
>   }
> @@ -5043,7 +5043,7 @@ static void reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
>   	__reset_rsvds_bits_mask(shadow_zero_check, reserved_hpa_bits(),
>   				context->root_role.level,
>   				context->root_role.efer_nx,
> -				guest_can_use(vcpu, X86_FEATURE_GBPAGES),
> +				guest_cpu_cap_has(vcpu, X86_FEATURE_GBPAGES),
>   				is_pse, is_amd);
>   
>   	if (!shadow_me_mask)
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 55b9a6d96bcf..2900a8e21257 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -107,7 +107,7 @@ static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu)
>   
>   static bool nested_vmcb_needs_vls_intercept(struct vcpu_svm *svm)
>   {
> -	if (!guest_can_use(&svm->vcpu, X86_FEATURE_V_VMSAVE_VMLOAD))
> +	if (!guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_V_VMSAVE_VMLOAD))
>   		return true;
>   
>   	if (!nested_npt_enabled(svm))
> @@ -590,7 +590,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
>   		vmcb_mark_dirty(vmcb02, VMCB_DR);
>   	}
>   
> -	if (unlikely(guest_can_use(vcpu, X86_FEATURE_LBRV) &&
> +	if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
>   		     (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
>   		/*
>   		 * Reserved bits of DEBUGCTL are ignored.  Be consistent with
> @@ -647,7 +647,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>   	 * exit_int_info, exit_int_info_err, next_rip, insn_len, insn_bytes.
>   	 */
>   
> -	if (guest_can_use(vcpu, X86_FEATURE_VGIF) &&
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_VGIF) &&
>   	    (svm->nested.ctl.int_ctl & V_GIF_ENABLE_MASK))
>   		int_ctl_vmcb12_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
>   	else
> @@ -685,7 +685,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>   
>   	vmcb02->control.tsc_offset = vcpu->arch.tsc_offset;
>   
> -	if (guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR) &&
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR) &&
>   	    svm->tsc_ratio_msr != kvm_caps.default_tsc_scaling_ratio)
>   		nested_svm_update_tsc_ratio_msr(vcpu);
>   
> @@ -706,7 +706,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>   	 * what a nrips=0 CPU would do (L1 is responsible for advancing RIP
>   	 * prior to injecting the event).
>   	 */
> -	if (guest_can_use(vcpu, X86_FEATURE_NRIPS))
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
>   		vmcb02->control.next_rip    = svm->nested.ctl.next_rip;
>   	else if (boot_cpu_has(X86_FEATURE_NRIPS))
>   		vmcb02->control.next_rip    = vmcb12_rip;
> @@ -716,7 +716,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>   		svm->soft_int_injected = true;
>   		svm->soft_int_csbase = vmcb12_csbase;
>   		svm->soft_int_old_rip = vmcb12_rip;
> -		if (guest_can_use(vcpu, X86_FEATURE_NRIPS))
> +		if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
>   			svm->soft_int_next_rip = svm->nested.ctl.next_rip;
>   		else
>   			svm->soft_int_next_rip = vmcb12_rip;
> @@ -724,18 +724,18 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>   
>   	vmcb02->control.virt_ext            = vmcb01->control.virt_ext &
>   					      LBR_CTL_ENABLE_MASK;
> -	if (guest_can_use(vcpu, X86_FEATURE_LBRV))
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV))
>   		vmcb02->control.virt_ext  |=
>   			(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK);
>   
>   	if (!nested_vmcb_needs_vls_intercept(svm))
>   		vmcb02->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
>   
> -	if (guest_can_use(vcpu, X86_FEATURE_PAUSEFILTER))
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_PAUSEFILTER))
>   		pause_count12 = svm->nested.ctl.pause_filter_count;
>   	else
>   		pause_count12 = 0;
> -	if (guest_can_use(vcpu, X86_FEATURE_PFTHRESHOLD))
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_PFTHRESHOLD))
>   		pause_thresh12 = svm->nested.ctl.pause_filter_thresh;
>   	else
>   		pause_thresh12 = 0;
> @@ -1022,7 +1022,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>   	if (vmcb12->control.exit_code != SVM_EXIT_ERR)
>   		nested_save_pending_event_to_vmcb12(svm, vmcb12);
>   
> -	if (guest_can_use(vcpu, X86_FEATURE_NRIPS))
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_NRIPS))
>   		vmcb12->control.next_rip  = vmcb02->control.next_rip;
>   
>   	vmcb12->control.int_ctl           = svm->nested.ctl.int_ctl;
> @@ -1061,7 +1061,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>   	if (!nested_exit_on_intr(svm))
>   		kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
>   
> -	if (unlikely(guest_can_use(vcpu, X86_FEATURE_LBRV) &&
> +	if (unlikely(guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
>   		     (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK))) {
>   		svm_copy_lbrs(vmcb12, vmcb02);
>   		svm_update_lbrv(vcpu);
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 57c2c8025547..7640dedc2ddc 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4409,16 +4409,15 @@ static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
>   	 * For SEV-ES, accesses to MSR_IA32_XSS should not be intercepted if
>   	 * the host/guest supports its use.
>   	 *
> -	 * guest_can_use() checks a number of requirements on the host/guest to
> -	 * ensure that MSR_IA32_XSS is available, but it might report true even
> -	 * if X86_FEATURE_XSAVES isn't configured in the guest to ensure host
> -	 * MSR_IA32_XSS is always properly restored. For SEV-ES, it is better
> -	 * to further check that the guest CPUID actually supports
> -	 * X86_FEATURE_XSAVES so that accesses to MSR_IA32_XSS by misbehaved
> -	 * guests will still get intercepted and caught in the normal
> -	 * kvm_emulate_rdmsr()/kvm_emulated_wrmsr() paths.
> +	 * KVM treats the guest as being capable of using XSAVES even if XSAVES
> +	 * isn't enabled in guest CPUID as there is no intercept for XSAVES,
> +	 * i.e. the guest can use XSAVES/XRSTOR to read/write XSS if XSAVE is
> +	 * exposed to the guest and XSAVES is supported in hardware.  Condition
> +	 * full XSS passthrough on the guest being able to use XSAVES *and*
> +	 * XSAVES being exposed to the guest so that KVM can at least honor
> +	 * guest CPUID for RDMSR and WRMSR.
>   	 */
> -	if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
>   	    guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
>   		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_XSS, 1, 1);
>   	else
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 3d0549ca246f..2acd2e3bb1b0 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1039,7 +1039,7 @@ void svm_update_lbrv(struct kvm_vcpu *vcpu)
>   	struct vcpu_svm *svm = to_svm(vcpu);
>   	bool current_enable_lbrv = svm->vmcb->control.virt_ext & LBR_CTL_ENABLE_MASK;
>   	bool enable_lbrv = (svm_get_lbr_vmcb(svm)->save.dbgctl & DEBUGCTLMSR_LBR) ||
> -			    (is_guest_mode(vcpu) && guest_can_use(vcpu, X86_FEATURE_LBRV) &&
> +			    (is_guest_mode(vcpu) && guest_cpu_cap_has(vcpu, X86_FEATURE_LBRV) &&
>   			    (svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK));
>   
>   	if (enable_lbrv == current_enable_lbrv)
> @@ -2841,7 +2841,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	switch (msr_info->index) {
>   	case MSR_AMD64_TSC_RATIO:
>   		if (!msr_info->host_initiated &&
> -		    !guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR))
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR))
>   			return 1;
>   		msr_info->data = svm->tsc_ratio_msr;
>   		break;
> @@ -2991,7 +2991,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>   	switch (ecx) {
>   	case MSR_AMD64_TSC_RATIO:
>   
> -		if (!guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR)) {
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR)) {
>   
>   			if (!msr->host_initiated)
>   				return 1;
> @@ -3013,7 +3013,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>   
>   		svm->tsc_ratio_msr = data;
>   
> -		if (guest_can_use(vcpu, X86_FEATURE_TSCRATEMSR) &&
> +		if (guest_cpu_cap_has(vcpu, X86_FEATURE_TSCRATEMSR) &&
>   		    is_guest_mode(vcpu))
>   			nested_svm_update_tsc_ratio_msr(vcpu);
>   
> @@ -4342,11 +4342,11 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>   	if (boot_cpu_has(X86_FEATURE_XSAVE) &&
>   	    boot_cpu_has(X86_FEATURE_XSAVES) &&
>   	    guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
> -		kvm_governed_feature_set(vcpu, X86_FEATURE_XSAVES);
> +		guest_cpu_cap_set(vcpu, X86_FEATURE_XSAVES);
>   
> -	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_NRIPS);
> -	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR);
> -	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_LBRV);
> +	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_NRIPS);
> +	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR);
> +	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LBRV);
>   
>   	/*
>   	 * Intercept VMLOAD if the vCPU mode is Intel in order to emulate that
> @@ -4354,12 +4354,12 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>   	 * SVM on Intel is bonkers and extremely unlikely to work).
>   	 */
>   	if (!guest_cpuid_is_intel(vcpu))
> -		kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
> +		guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
>   
> -	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_PAUSEFILTER);
> -	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD);
> -	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VGIF);
> -	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VNMI);
> +	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PAUSEFILTER);
> +	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD);
> +	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VGIF);
> +	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VNMI);
>   
>   	svm_recalc_instruction_intercepts(vcpu, svm);
>   
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 97b3683ea324..08fd788d08df 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -487,7 +487,7 @@ static inline bool svm_is_intercept(struct vcpu_svm *svm, int bit)
>   
>   static inline bool nested_vgif_enabled(struct vcpu_svm *svm)
>   {
> -	return guest_can_use(&svm->vcpu, X86_FEATURE_VGIF) &&
> +	return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_VGIF) &&
>   	       (svm->nested.ctl.int_ctl & V_GIF_ENABLE_MASK);
>   }
>   
> @@ -539,7 +539,7 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
>   
>   static inline bool nested_vnmi_enabled(struct vcpu_svm *svm)
>   {
> -	return guest_can_use(&svm->vcpu, X86_FEATURE_VNMI) &&
> +	return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_VNMI) &&
>   	       (svm->nested.ctl.int_ctl & V_NMI_ENABLE_MASK);
>   }
>   
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index d5b832126e34..fb7eec29681d 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -6488,7 +6488,7 @@ static int vmx_get_nested_state(struct kvm_vcpu *vcpu,
>   	vmx = to_vmx(vcpu);
>   	vmcs12 = get_vmcs12(vcpu);
>   
> -	if (guest_can_use(vcpu, X86_FEATURE_VMX) &&
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_VMX) &&
>   	    (vmx->nested.vmxon || vmx->nested.smm.vmxon)) {
>   		kvm_state.hdr.vmx.vmxon_pa = vmx->nested.vmxon_ptr;
>   		kvm_state.hdr.vmx.vmcs12_pa = vmx->nested.current_vmptr;
> @@ -6629,7 +6629,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
>   		if (kvm_state->flags & ~KVM_STATE_NESTED_EVMCS)
>   			return -EINVAL;
>   	} else {
> -		if (!guest_can_use(vcpu, X86_FEATURE_VMX))
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
>   			return -EINVAL;
>   
>   		if (!page_address_valid(vcpu, kvm_state->hdr.vmx.vmxon_pa))
> @@ -6663,7 +6663,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
>   		return -EINVAL;
>   
>   	if ((kvm_state->flags & KVM_STATE_NESTED_EVMCS) &&
> -	    (!guest_can_use(vcpu, X86_FEATURE_VMX) ||
> +	    (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX) ||
>   	     !vmx->nested.enlightened_vmcs_enabled))
>   			return -EINVAL;
>   
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 51b2cd13250a..1bc56596d653 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2050,7 +2050,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   			[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
>   		break;
>   	case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
> -		if (!guest_can_use(vcpu, X86_FEATURE_VMX))
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
>   			return 1;
>   		if (vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
>   				    &msr_info->data))
> @@ -2360,7 +2360,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
>   		if (!msr_info->host_initiated)
>   			return 1; /* they are read-only */
> -		if (!guest_can_use(vcpu, X86_FEATURE_VMX))
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
>   			return 1;
>   		return vmx_set_vmx_msr(vcpu, msr_index, data);
>   	case MSR_IA32_RTIT_CTL:
> @@ -4571,7 +4571,7 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
>   												\
>   	if (cpu_has_vmx_##name()) {								\
>   		if (kvm_is_governed_feature(X86_FEATURE_##feat_name))				\
> -			__enabled = guest_can_use(__vcpu, X86_FEATURE_##feat_name);		\
> +			__enabled = guest_cpu_cap_has(__vcpu, X86_FEATURE_##feat_name);		\
>   		else										\
>   			__enabled = guest_cpuid_has(__vcpu, X86_FEATURE_##feat_name);		\
>   		vmx_adjust_secondary_exec_control(vmx, exec_control, SECONDARY_EXEC_##ctrl_name,\
> @@ -7838,10 +7838,10 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>   	 */
>   	if (boot_cpu_has(X86_FEATURE_XSAVE) &&
>   	    guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
> -		kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_XSAVES);
> +		guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_XSAVES);
>   
> -	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VMX);
> -	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_LAM);
> +	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VMX);
> +	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LAM);
>   
>   	vmx_setup_uret_msrs(vmx);
>   
> @@ -7849,7 +7849,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>   		vmcs_set_secondary_exec_control(vmx,
>   						vmx_secondary_exec_control(vmx));
>   
> -	if (guest_can_use(vcpu, X86_FEATURE_VMX))
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
>   		vmx->msr_ia32_feature_control_valid_bits |=
>   			FEAT_CTL_VMX_ENABLED_INSIDE_SMX |
>   			FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX;
> @@ -7858,7 +7858,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>   			~(FEAT_CTL_VMX_ENABLED_INSIDE_SMX |
>   			  FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX);
>   
> -	if (guest_can_use(vcpu, X86_FEATURE_VMX))
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
>   		nested_vmx_cr_fixed1_bits_update(vcpu);
>   
>   	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7160c5ab8e3e..4ca9651b3f43 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1026,7 +1026,7 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
>   		if (vcpu->arch.xcr0 != host_xcr0)
>   			xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);
>   
> -		if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
> +		if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
>   		    vcpu->arch.ia32_xss != host_xss)
>   			wrmsrl(MSR_IA32_XSS, vcpu->arch.ia32_xss);
>   	}
> @@ -1057,7 +1057,7 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
>   		if (vcpu->arch.xcr0 != host_xcr0)
>   			xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0);
>   
> -		if (guest_can_use(vcpu, X86_FEATURE_XSAVES) &&
> +		if (guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVES) &&
>   		    vcpu->arch.ia32_xss != host_xss)
>   			wrmsrl(MSR_IA32_XSS, host_xss);
>   	}


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 33/49] KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID
  2024-05-22  9:11   ` Binbin Wu
@ 2024-05-28 15:21     ` Sean Christopherson
  0 siblings, 0 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-05-28 15:21 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Maxim Levitsky, Yang Weijiang,
	Robert Hoo

On Wed, May 22, 2024, Binbin Wu wrote:
> 
> 
> On 5/18/2024 1:39 AM, Sean Christopherson wrote:
> > Advertise TSC_DEADLINE_TIMER via KVM_GET_SUPPORTED_CPUID when it's
> > supported in hardware,
> 
> But it's using EMUL_F(TSC_DEADLINE_TIMER) below?

Doh, yeah, the changelog is wrong.  KVM always emulates TSC_DEADLINE_TIMER.

Thanks!

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 20/49] KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init()
  2024-05-22  6:23   ` Binbin Wu
@ 2024-05-28 18:54     ` Sean Christopherson
  0 siblings, 0 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-05-28 18:54 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Maxim Levitsky, Yang Weijiang,
	Robert Hoo

On Wed, May 22, 2024, Binbin Wu wrote:
> On 5/18/2024 1:38 AM, Sean Christopherson wrote:
> > Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init() in anticipation of merging
> > it with kvm_cpu_cap_init_kvm_defined(), and in anticipation of _setting_
> > bits in the helper (a future commit will play macro games to set emulated
> > feature flags via kvm_cpu_cap_init()).
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/cpuid.c | 36 ++++++++++++++++++------------------
> >   1 file changed, 18 insertions(+), 18 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index a802c09b50ab..5a4d6138c4f1 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -74,7 +74,7 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
> >    * Raw Feature - For features that KVM supports based purely on raw host CPUID,
> >    * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
> >    * Simply force set the feature in KVM's capabilities, raw CPUID support will
> > - * be factored in by kvm_cpu_cap_mask().
> > + * be factored in by __kvm_cpu_cap_mask().
> 
> kvm_cpu_cap_init()?

Drat, yes.  IIRC, I tried to get clever to avoid having to update this comment a
second time, but then I ended up removing __kvm_cpu_cap_mask() entirely.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed
  2024-05-22  5:09   ` Binbin Wu
@ 2024-05-28 18:56     ` Sean Christopherson
  0 siblings, 0 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-05-28 18:56 UTC (permalink / raw)
  To: Binbin Wu
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Maxim Levitsky, Yang Weijiang,
	Robert Hoo

On Wed, May 22, 2024, Binbin Wu wrote:
> On 5/18/2024 1:38 AM, Sean Christopherson wrote:
> > @@ -4726,15 +4740,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >   		r = KVM_CLOCK_VALID_FLAGS;
> >   		break;
> >   	case KVM_CAP_X86_DISABLE_EXITS:
> > -		r = KVM_X86_DISABLE_EXITS_PAUSE;
> > -
> > -		if (!mitigate_smt_rsb) {
> > -			r |= KVM_X86_DISABLE_EXITS_HLT |
> > -			     KVM_X86_DISABLE_EXITS_CSTATE;
> > -
> > -			if (kvm_can_mwait_in_guest())
> > -				r |= KVM_X86_DISABLE_EXITS_MWAIT;
> > -		}
> > +		r |= kvm_get_allowed_disable_exits();
> 
> Nit: Just use "=".

Yowsers, that's more than a nit, that's downright bad code, it just happens to be
functionally ok.  Thanks again for the reviews!

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 18/49] KVM: x86: Account for max supported CPUID leaf when getting raw host CPUID
  2024-05-17 17:38 ` [PATCH v2 18/49] KVM: x86: Account for max supported CPUID leaf when getting raw host CPUID Sean Christopherson
@ 2024-06-19  6:17   ` Yang, Weijiang
  2024-06-19  8:07     ` Yang, Weijiang
  2024-07-05  1:17   ` Maxim Levitsky
  1 sibling, 1 reply; 185+ messages in thread
From: Yang, Weijiang @ 2024-06-19  6:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Paolo Bonzini, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Maxim Levitsky, Binbin Wu, Robert Hoo

On 5/18/2024 1:38 AM, Sean Christopherson wrote:

[...]

>   /* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */
>   static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
>   {
>   	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
> -	struct kvm_cpuid_entry2 entry;
>   
>   	reverse_cpuid_check(leaf);

IIUC, this reverse_cpuid_check() is redundant since it's already enforced in x86_feature_cpuid() via __feature_leaf() as previous patch(17) shows.
>   
> -	cpuid_count(cpuid.function, cpuid.index,
> -		    &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);
> -
> -	kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);
> +	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
>   }
>   
>   static __always_inline


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 18/49] KVM: x86: Account for max supported CPUID leaf when getting raw host CPUID
  2024-06-19  6:17   ` Yang, Weijiang
@ 2024-06-19  8:07     ` Yang, Weijiang
  0 siblings, 0 replies; 185+ messages in thread
From: Yang, Weijiang @ 2024-06-19  8:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Paolo Bonzini, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Maxim Levitsky, Binbin Wu, Robert Hoo

On 6/19/2024 2:17 PM, Yang, Weijiang wrote:
> On 5/18/2024 1:38 AM, Sean Christopherson wrote:
>
> [...]
>
>>   /* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */
>>   static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
>>   {
>>       const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
>> -    struct kvm_cpuid_entry2 entry;
>>         reverse_cpuid_check(leaf);
>
> IIUC, this reverse_cpuid_check() is redundant since it's already enforced in x86_feature_cpuid() via __feature_leaf() as previous patch(17) shows.

Aha, I saw the function is removed in patch(23). Sorry for the noise.

>>   -    cpuid_count(cpuid.function, cpuid.index,
>> -            &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);
>> -
>> -    kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);
>> +    kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
>>   }
>>     static __always_inline
>
>


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 37/49] KVM: x86: Replace guts of "governed" features with comprehensive cpu_caps
  2024-05-17 17:39 ` [PATCH v2 37/49] KVM: x86: Replace guts of "governed" features with comprehensive cpu_caps Sean Christopherson
@ 2024-06-20  2:20   ` Yang, Weijiang
  2024-07-05  2:10   ` Maxim Levitsky
  1 sibling, 0 replies; 185+ messages in thread
From: Yang, Weijiang @ 2024-06-20  2:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Paolo Bonzini, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Maxim Levitsky, Binbin Wu, Robert Hoo

On 5/18/2024 1:39 AM, Sean Christopherson wrote:

[...]

> index e021681f34ac..ad0168d3aec5 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -259,10 +259,10 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature)
>   static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
>   					      unsigned int x86_feature)
>   {
> -	BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
> +	unsigned int x86_leaf = __feature_leaf(x86_feature);
>   
> -	__set_bit(kvm_governed_feature_index(x86_feature),
> -		  vcpu->arch.governed_features.enabled);
> +	reverse_cpuid_check(x86_leaf);

This reverse_cpuid_check() seems unnecessary since in patch(17), we already have moved it in
  __feature_leaf(). But I don't have full source code to double check it now.

> +	vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
>   }
>   
>   static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
> @@ -275,10 +275,10 @@ static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
>   static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
>   					      unsigned int x86_feature)
>   {
> -	BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
> +	unsigned int x86_leaf = __feature_leaf(x86_feature);
>   
> -	return test_bit(kvm_governed_feature_index(x86_feature),
> -			vcpu->arch.governed_features.enabled);
> +	reverse_cpuid_check(x86_leaf);

Ditto.

> +	return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature);
>   }
>   
>
[...]


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 38/49] KVM: x86: Initialize guest cpu_caps based on guest CPUID
  2024-05-17 17:39 ` [PATCH v2 38/49] KVM: x86: Initialize guest cpu_caps based on guest CPUID Sean Christopherson
@ 2024-06-20  2:24   ` Yang, Weijiang
  2024-07-05  2:13   ` Maxim Levitsky
  1 sibling, 0 replies; 185+ messages in thread
From: Yang, Weijiang @ 2024-06-20  2:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Paolo Bonzini, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Maxim Levitsky, Binbin Wu, Robert Hoo

On 5/18/2024 1:39 AM, Sean Christopherson wrote:

[...]

>   
> -static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
> -							unsigned int x86_feature)
> +static __always_inline void guest_cpu_cap_clear(struct kvm_vcpu *vcpu,
> +						unsigned int x86_feature)
>   {
> -	if (kvm_cpu_cap_has(x86_feature) && guest_cpuid_has(vcpu, x86_feature))
> +	unsigned int x86_leaf = __feature_leaf(x86_feature);
> +
> +	reverse_cpuid_check(x86_leaf);

Unnecessary reverse_cpuid_check()  same as in previous patch.

> +	vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature);
> +}
> +
>
[...]


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 01/49] KVM: x86: Do all post-set CPUID processing during vCPU creation
  2024-05-17 17:38 ` [PATCH v2 01/49] KVM: x86: Do all post-set CPUID processing during vCPU creation Sean Christopherson
@ 2024-07-05  0:48   ` Maxim Levitsky
  2024-07-08 18:46     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  0:48 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> During vCPU creation, process KVM's default, empty CPUID as if userspace
> set an empty CPUID to ensure consistent and correct behavior with respect
> to guest CPUID.  E.g. if userspace never sets guest CPUID, KVM will never
> configure cr4_guest_rsvd_bits, and thus create divergent, incorrect, guest-
> visible behavior due to letting the guest set any KVM-supported CR4 bits
> despite the features not being allowed per guest CPUID.
> 
> Note!  This changes KVM's ABI, as lack of full CPUID processing allowed
> userspace to stuff garbage vCPU state, e.g. userspace could set CR4 to a
> guest-unsupported value via KVM_SET_SREGS.  But it's extremely unlikely
> that this is a breaking change, as KVM already has many flows that require
> userspace to set guest CPUID before loading vCPU state.  E.g. multiple MSR
> flows consult guest CPUID on host writes, and KVM_SET_SREGS itself already
> relies on guest CPUID being up-to-date, as KVM's validity check on CR3
> consumes CPUID.0x7.1 (for LAM) and CPUID.0x80000008 (for MAXPHYADDR).
> 
> Furthermore, the plan is to commit to enforcing guest CPUID for userspace
> writes to MSRs, at which point bypassing sregs CPUID checks is even more
> nonsensical.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 2 +-
>  arch/x86/kvm/cpuid.h | 1 +
>  arch/x86/kvm/x86.c   | 1 +
>  3 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index f2f2be5d1141..2b19ff991ceb 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -335,7 +335,7 @@ static bool kvm_cpuid_has_hyperv(struct kvm_cpuid_entry2 *entries, int nent)
>  #endif
>  }
>  
> -static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> +void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  	struct kvm_cpuid_entry2 *best;
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index 23dbb9eb277c..0a8b561b5434 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -11,6 +11,7 @@
>  extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
>  void kvm_set_cpu_caps(void);
>  
> +void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
>  void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
>  void kvm_update_pv_runtime(struct kvm_vcpu *vcpu);
>  struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d750546ec934..7adcf56bd45d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12234,6 +12234,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  	kvm_xen_init_vcpu(vcpu);
>  	kvm_vcpu_mtrr_init(vcpu);
>  	vcpu_load(vcpu);
> +	kvm_vcpu_after_set_cpuid(vcpu);

This makes me a bit nervous. At this point the vcpu->arch.cpuid_entries is NULL,
but so is vcpu->arch.cpuid_nent so it sort of works but is one mistake away from crash.

Maybe we should add some protection to this, e.g empty zero cpuid or something like that.

Best regards,
	Maxim Levitsky


>  	kvm_set_tsc_khz(vcpu, vcpu->kvm->arch.default_tsc_khz);
>  	kvm_vcpu_reset(vcpu, false);
>  	kvm_init_mmu(vcpu);





^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 02/49] KVM: x86: Explicitly do runtime CPUID updates "after" initial setup
  2024-05-17 17:38 ` [PATCH v2 02/49] KVM: x86: Explicitly do runtime CPUID updates "after" initial setup Sean Christopherson
@ 2024-07-05  0:51   ` Maxim Levitsky
  2024-07-09 19:46     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  0:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Explicitly perform runtime CPUID adjustments as part of the "after set
> CPUID" flow to guard against bugs where KVM consumes stale vCPU/CPUID
> state during kvm_update_cpuid_runtime().  E.g. see commit 4736d85f0d18
> ("KVM: x86: Use actual kvm_cpuid.base for clearing KVM_FEATURE_PV_UNHALT").
> 
> Whacking each mole individually is not sustainable or robust, e.g. while
> the aforemention commit fixed KVM's PV features, the same issue lurks for
> Xen and Hyper-V features, Xen and Hyper-V simply don't have any runtime
> features (though spoiler alert, neither should KVM).

> 
> Updating runtime features in the "full" path will also simplify adding a
> snapshot of the guest's capabilities, i.e. of caching the intersection of
> guest CPUID and kvm_cpu_caps (modulo a few edge cases).
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 2b19ff991ceb..e60ffb421e4b 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -345,6 +345,8 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	bitmap_zero(vcpu->arch.governed_features.enabled,
>  		    KVM_MAX_NR_GOVERNED_FEATURES);
>  
> +	kvm_update_cpuid_runtime(vcpu);
> +
>  	/*
>  	 * If TDP is enabled, let the guest use GBPAGES if they're supported in
>  	 * hardware.  The hardware page walker doesn't let KVM disable GBPAGES,
> @@ -426,8 +428,6 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
>  {
>  	int r;
>  
> -	__kvm_update_cpuid_runtime(vcpu, e2, nent);
> -
>  	/*
>  	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
>  	 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
> @@ -440,6 +440,15 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
>  	 * whether the supplied CPUID data is equal to what's already set.
>  	 */
>  	if (kvm_vcpu_has_run(vcpu)) {
> +		/*
> +		 * Note, runtime CPUID updates may consume other CPUID-driven
> +		 * vCPU state, e.g. KVM or Xen CPUID bases.  Updating runtime
> +		 * state before full CPUID processing is functionally correct
> +		 * only because any change in CPUID is disallowed, i.e. using
> +		 * stale data is ok because KVM will reject the change.
> +		 */

If I understand correctly the sole reason for the below __kvm_update_cpuid_runtime
is to ensure that kvm_cpuid_check_equal doesn't fail because current cpuid also
was post-processed with runtime updates.

Can we have a comment stating this? Or even better how about moving the
call to __kvm_update_cpuid_runtime into the kvm_cpuid_check_equal,
to emphasize this?


> +		__kvm_update_cpuid_runtime(vcpu, e2, nent);
> +
>  		r = kvm_cpuid_check_equal(vcpu, e2, nent);
>  		if (r)
>  			return r;



Overall I am not 100% sure what is better:

Before the patch it was roughly like this:

1. Post process the user given cpuid with bits of KVM runtime state (like xcr0)
At that point the vcpu->arch.cpuid_entries is stale but consistent, it is just old CPUID.

2. kvm_hv_vcpu_init call (IMHO this call can be moved to kvm_vcpu_after_set_cpuid)

3. kvm_check_cpuid on the user provided cpuid

4. Update the vcpu->arch.cpuid_entries with new and post processed cpuid

5. kvm_get_hypervisor_cpuid - I think this also can be cosmetically moved to kvm_vcpu_after_set_cpuid

6. kvm_vcpu_after_set_cpuid itself.


After this change it works like that:

1. kvm_hv_vcpu_init (again this belongs more to kvm_vcpu_after_set_cpuid)
2. kvm_check_cpuid on the user cpuid without post processing - in theory this can cause bugs
3. Update the vcpu->arch.cpuid_entries with new cpuid but without post-processing
4. kvm_get_hypervisor_cpuid
5. kvm_update_cpuid_runtime
6. The old kvm_vcpu_after_set_cpuid

I'm honestly not sure what is better but IMHO moving the kvm_hv_vcpu_init and kvm_get_hypervisor_cpuid into
kvm_vcpu_after_set_cpuid would clean up this mess a bit regardless of this patch.

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 03/49] KVM: x86: Account for KVM-reserved CR4 bits when passing through CR4 on VMX
  2024-05-17 17:38 ` [PATCH v2 03/49] KVM: x86: Account for KVM-reserved CR4 bits when passing through CR4 on VMX Sean Christopherson
@ 2024-07-05  0:55   ` Maxim Levitsky
  2024-07-09 19:58     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Drop x86.c's local pre-computed cr4_reserved bits and instead fold KVM's
> reserved bits into the guest's reserved bits.  This fixes a bug where VMX's
> set_cr4_guest_host_mask() fails to account for KVM-reserved bits when
> deciding which bits can be passed through to the guest.  In most cases,
> letting the guest directly write reserved CR4 bits is ok, i.e. attempting
> to set the bit(s) will still #GP, but not if a feature is available in
> hardware but explicitly disabled by the host, e.g. if FSGSBASE support is
> disabled via "nofsgsbase".
> 
> Note, the extra overhead of computing host reserved bits every time
> userspace sets guest CPUID is negligible.  The feature bits that are
> queried are packed nicely into a handful of words, and so checking and
> setting each reserved bit costs in the neighborhood of ~5 cycles, i.e. the
> total cost will be in the noise even if the number of checked CR4 bits
> doubles over the next few years.  In other words, x86 will run out of CR4
> bits long before the overhead becomes problematic.

It might be just me, but IMHO this justification is confusing, leading me to belive that maybe
the code is on the hot-path instead.

The right justification should be just that this code is in kvm_vcpu_after_set_cpuid
is usually (*) only called once per vCPU (twice after your patch #1)

(*) Qemu also calls it, each time vCPU is hotplugged but this doesn't change anything
performance wise.

> 
> Note #2, __cr4_reserved_bits() starts from CR4_RESERVED_BITS, which is
> why the existing __kvm_cpu_cap_has() processing doesn't explicitly OR in
> CR4_RESERVED_BITS (and why the new code doesn't do so either).
> 
> Fixes: 2ed41aa631fc ("KVM: VMX: Intercept guest reserved CR4 bits to inject #GP fault")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 7 +++++--
>  arch/x86/kvm/x86.c   | 9 ---------
>  2 files changed, 5 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index e60ffb421e4b..f756a91a3f2f 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -383,8 +383,11 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
>  
>  	kvm_pmu_refresh(vcpu);
> -	vcpu->arch.cr4_guest_rsvd_bits =
> -	    __cr4_reserved_bits(guest_cpuid_has, vcpu);
> +
> +#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
> +	vcpu->arch.cr4_guest_rsvd_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_) |
> +					 __cr4_reserved_bits(guest_cpuid_has, vcpu);
> +#undef __kvm_cpu_cap_has
>  
>  	kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu->arch.cpuid_entries,
>  						    vcpu->arch.cpuid_nent));
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7adcf56bd45d..3f20de4368a6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -116,8 +116,6 @@ u64 __read_mostly efer_reserved_bits = ~((u64)(EFER_SCE | EFER_LME | EFER_LMA));
>  static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
>  #endif
>  
> -static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;
> -
>  #define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
>  
>  #define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
> @@ -1134,9 +1132,6 @@ EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv);
>  
>  bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  {
> -	if (cr4 & cr4_reserved_bits)
> -		return false;
> -
>  	if (cr4 & vcpu->arch.cr4_guest_rsvd_bits)
>  		return false;
>  
> @@ -9831,10 +9826,6 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>  	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
>  		kvm_caps.supported_xss = 0;
>  
> -#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
> -	cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
> -#undef __kvm_cpu_cap_has
> -
>  	if (kvm_caps.has_tsc_control) {
>  		/*
>  		 * Make sure the user can only configure tsc_khz values that


I mostly agree with this patch - caching always carries risks and when it doesn't
value performance wise, it should always be removed.


However I don't think that this patch fixes a bug as it claims:

This is the code prior to this patch:

kvm_x86_vendor_init ->

	r = ops->hardware_setup();
		svm_hardware_setup
			svm_set_cpu_caps + kvm_set_cpu_caps

		-- or --

		vmx_hardware_setup ->
			vmx_set_cpu_caps + + kvm_set_cpu_caps


	# read from 'kvm_cpu_caps'
	cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);


AFAIK kvm cpu caps are never touched outside of svm_set_cpu_caps/vmx_hardware_setup
(they don't depend on some later post-processing, cpuid, etc).


In fact a good refactoring would to make kvm_cpu_caps const after this point,
using cast, assert or something like that.


This leads me to believe that cr4_reserved_bits is computed correctly.


I could be wrong, but then IMHO it is a very good idea to provide an explanation
on how this bug can happen.


Best regards,
	Maxim Levitsky







^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 04/49] KVM: selftests: Update x86's set_sregs_test to match KVM's CPUID enforcement
  2024-05-17 17:38 ` [PATCH v2 04/49] KVM: selftests: Update x86's set_sregs_test to match KVM's CPUID enforcement Sean Christopherson
@ 2024-07-05  0:55   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  0:55 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Rework x86's set sregs test to verify that KVM enforces CPUID vs. CR4
> features even if userspace hasn't explicitly set guest CPUID.  KVM used to
> allow userspace to set any KVM-supported CR4 value prior to KVM_SET_CPUID2,
> and the test verified that behavior.
> 
> However, the testcase was written purely to verify KVM's existing behavior,
> i.e. was NOT written to match the needs of real world VMMs.
> 
> Opportunistically verify that KVM continues to reject unsupported features
> after KVM_SET_CPUID2 (using KVM_GET_SUPPORTED_CPUID).
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  .../selftests/kvm/x86_64/set_sregs_test.c     | 53 +++++++++++--------
>  1 file changed, 30 insertions(+), 23 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
> index c021c0795a96..96fd690d479a 100644
> --- a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
> @@ -41,13 +41,15 @@ do {										\
>  	TEST_ASSERT(!memcmp(&new, &orig, sizeof(new)), "KVM modified sregs");	\
>  } while (0)
>  
> +#define KVM_ALWAYS_ALLOWED_CR4 (X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD |	\
> +				X86_CR4_DE | X86_CR4_PSE | X86_CR4_PAE |	\
> +				X86_CR4_MCE | X86_CR4_PGE | X86_CR4_PCE |	\
> +				X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT)
> +
>  static uint64_t calc_supported_cr4_feature_bits(void)
>  {
> -	uint64_t cr4;
> +	uint64_t cr4 = KVM_ALWAYS_ALLOWED_CR4;
>  
> -	cr4 = X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE |
> -	      X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE | X86_CR4_PGE |
> -	      X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT;
>  	if (kvm_cpu_has(X86_FEATURE_UMIP))
>  		cr4 |= X86_CR4_UMIP;
>  	if (kvm_cpu_has(X86_FEATURE_LA57))
> @@ -72,28 +74,14 @@ static uint64_t calc_supported_cr4_feature_bits(void)
>  	return cr4;
>  }
>  
> -int main(int argc, char *argv[])
> +static void test_cr_bits(struct kvm_vcpu *vcpu, uint64_t cr4)
>  {
>  	struct kvm_sregs sregs;
> -	struct kvm_vcpu *vcpu;
> -	struct kvm_vm *vm;
> -	uint64_t cr4;
>  	int rc, i;
>  
> -	/*
> -	 * Create a dummy VM, specifically to avoid doing KVM_SET_CPUID2, and
> -	 * use it to verify all supported CR4 bits can be set prior to defining
> -	 * the vCPU model, i.e. without doing KVM_SET_CPUID2.
> -	 */
> -	vm = vm_create_barebones();
> -	vcpu = __vm_vcpu_add(vm, 0);
> -
>  	vcpu_sregs_get(vcpu, &sregs);
> -
> -	sregs.cr0 = 0;
> -	sregs.cr4 |= calc_supported_cr4_feature_bits();
> -	cr4 = sregs.cr4;
> -
> +	sregs.cr0 &= ~(X86_CR0_CD | X86_CR0_NW);
> +	sregs.cr4 |= cr4;
>  	rc = _vcpu_sregs_set(vcpu, &sregs);
>  	TEST_ASSERT(!rc, "Failed to set supported CR4 bits (0x%lx)", cr4);
>  
> @@ -101,7 +89,6 @@ int main(int argc, char *argv[])
>  	TEST_ASSERT(sregs.cr4 == cr4, "sregs.CR4 (0x%llx) != CR4 (0x%lx)",
>  		    sregs.cr4, cr4);
>  
> -	/* Verify all unsupported features are rejected by KVM. */
>  	TEST_INVALID_CR_BIT(vcpu, cr4, sregs, X86_CR4_UMIP);
>  	TEST_INVALID_CR_BIT(vcpu, cr4, sregs, X86_CR4_LA57);
>  	TEST_INVALID_CR_BIT(vcpu, cr4, sregs, X86_CR4_VMXE);
> @@ -119,10 +106,28 @@ int main(int argc, char *argv[])
>  	/* NW without CD is illegal, as is PG without PE. */
>  	TEST_INVALID_CR_BIT(vcpu, cr0, sregs, X86_CR0_NW);
>  	TEST_INVALID_CR_BIT(vcpu, cr0, sregs, X86_CR0_PG);
> +}
>  
> +int main(int argc, char *argv[])
> +{
> +	struct kvm_sregs sregs;
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_vm *vm;
> +	int rc;
> +
> +	/*
> +	 * Create a dummy VM, specifically to avoid doing KVM_SET_CPUID2, and
> +	 * use it to verify KVM enforces guest CPUID even if *userspace* never
> +	 * sets CPUID.
> +	 */
> +	vm = vm_create_barebones();
> +	vcpu = __vm_vcpu_add(vm, 0);
> +	test_cr_bits(vcpu, KVM_ALWAYS_ALLOWED_CR4);
>  	kvm_vm_free(vm);
>  
> -	/* Create a "real" VM and verify APIC_BASE can be set. */
> +	/* Create a "real" VM with a fully populated guest CPUID and verify
> +	 * APIC_BASE and all supported CR4 can be set.
> +	 */
>  	vm = vm_create_with_one_vcpu(&vcpu, NULL);
>  
>  	vcpu_sregs_get(vcpu, &sregs);
> @@ -135,6 +140,8 @@ int main(int argc, char *argv[])
>  	TEST_ASSERT(!rc, "Couldn't set IA32_APIC_BASE to %llx (valid)",
>  		    sregs.apic_base);
>  
> +	test_cr_bits(vcpu, calc_supported_cr4_feature_bits());
> +
>  	kvm_vm_free(vm);
>  
>  	return 0;


Makes sense.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 05/49] KVM: selftests: Assert that the @cpuid passed to get_cpuid_entry() is non-NULL
  2024-05-17 17:38 ` [PATCH v2 05/49] KVM: selftests: Assert that the @cpuid passed to get_cpuid_entry() is non-NULL Sean Christopherson
@ 2024-07-05  0:58   ` Maxim Levitsky
  2024-07-08 19:33     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  0:58 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Add a sanity check in get_cpuid_entry() to provide a friendlier error than
> a segfault when a test developer tries to use a vCPU CPUID helper on a
> barebones vCPU.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  tools/testing/selftests/kvm/lib/x86_64/processor.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index c664e446136b..f0f3434d767e 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -1141,6 +1141,8 @@ const struct kvm_cpuid_entry2 *get_cpuid_entry(const struct kvm_cpuid2 *cpuid,
>  {
>  	int i;
>  
> +	TEST_ASSERT(cpuid, "Must do vcpu_init_cpuid() first (or equivalent)");
> +
>  	for (i = 0; i < cpuid->nent; i++) {
>  		if (cpuid->entries[i].function == function &&
>  		    cpuid->entries[i].index == index)

Hi,

Maybe it is better to do this assert in __vcpu_get_cpuid_entry() because the assert might confuse the
reader, since it just tests for NULL but when it fails, it complains that you need to call some function.

There is also another call to get_cpuid_entry() in kvm_cpu_fms but this call can't have this issue.

Besides this nitpick, looks good to me.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 06/49] KVM: selftests: Refresh vCPU CPUID cache in __vcpu_get_cpuid_entry()
  2024-05-17 17:38 ` [PATCH v2 06/49] KVM: selftests: Refresh vCPU CPUID cache in __vcpu_get_cpuid_entry() Sean Christopherson
@ 2024-07-05  0:59   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  0:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Refresh selftests' CPUID cache in the vCPU structure when querying a CPUID
> entry so that tests don't consume stale data when KVM modifies CPUID as a
> side effect to a completely unrelated change.  E.g. KVM adjusts OSXSAVE in
> response to CR4.OSXSAVE changes.
> 
> Unnecessarily invoking KVM_GET_CPUID is suboptimal, but vcpu->cpuid exists
> to simplify selftests development, not for performance reasons.  And,
> unfortunately, trying to handle the side effects in tests or other flows
> is unpleasant, e.g. selftests could manually refresh if KVM_SET_SREGS is
> successful, but that would still leave a gap with respect to guest CR4
> changes.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  .../testing/selftests/kvm/include/x86_64/processor.h  | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index 8eb57de0b587..99aa3dfca16c 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -992,10 +992,17 @@ static inline struct kvm_cpuid2 *allocate_kvm_cpuid2(int nr_entries)
>  void vcpu_init_cpuid(struct kvm_vcpu *vcpu, const struct kvm_cpuid2 *cpuid);
>  void vcpu_set_hv_cpuid(struct kvm_vcpu *vcpu);
>  
> +static inline void vcpu_get_cpuid(struct kvm_vcpu *vcpu)
> +{
> +	vcpu_ioctl(vcpu, KVM_GET_CPUID2, vcpu->cpuid);
> +}
> +
>  static inline struct kvm_cpuid_entry2 *__vcpu_get_cpuid_entry(struct kvm_vcpu *vcpu,
>  							      uint32_t function,
>  							      uint32_t index)
>  {
> +	vcpu_get_cpuid(vcpu);
> +
>  	return (struct kvm_cpuid_entry2 *)get_cpuid_entry(vcpu->cpuid,
>  							  function, index);
>  }
> @@ -1016,7 +1023,7 @@ static inline int __vcpu_set_cpuid(struct kvm_vcpu *vcpu)
>  		return r;
>  
>  	/* On success, refresh the cache to pick up adjustments made by KVM. */
> -	vcpu_ioctl(vcpu, KVM_GET_CPUID2, vcpu->cpuid);
> +	vcpu_get_cpuid(vcpu);
>  	return 0;
>  }
>  
> @@ -1026,7 +1033,7 @@ static inline void vcpu_set_cpuid(struct kvm_vcpu *vcpu)
>  	vcpu_ioctl(vcpu, KVM_SET_CPUID2, vcpu->cpuid);
>  
>  	/* Refresh the cache to pick up adjustments made by KVM. */
> -	vcpu_ioctl(vcpu, KVM_GET_CPUID2, vcpu->cpuid);
> +	vcpu_get_cpuid(vcpu);
>  }
>  
>  void vcpu_set_cpuid_property(struct kvm_vcpu *vcpu,

Hi,

Indeed - fully agree with this, and again very sad that Intel
made their CPUID dynamic - what a hack :(

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 07/49] KVM: selftests: Verify KVM stuffs runtime CPUID OS bits on CR4 writes
  2024-05-17 17:38 ` [PATCH v2 07/49] KVM: selftests: Verify KVM stuffs runtime CPUID OS bits on CR4 writes Sean Christopherson
@ 2024-07-05  1:02   ` Maxim Levitsky
  2024-07-08 19:39     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:02 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Extend x86's set sregs test to verify that KVM sets/clears OSXSAVE and
> OSKPKE according to CR4.XSAVE and CR4.PKE respectively.  For performance
> reasons, KVM is responsible for emulating the architectural behavior of
> the OS CPUID bits tracking CR4.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  tools/testing/selftests/kvm/x86_64/set_sregs_test.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
> index 96fd690d479a..f4095a3d1278 100644
> --- a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
> @@ -85,6 +85,16 @@ static void test_cr_bits(struct kvm_vcpu *vcpu, uint64_t cr4)
>  	rc = _vcpu_sregs_set(vcpu, &sregs);
>  	TEST_ASSERT(!rc, "Failed to set supported CR4 bits (0x%lx)", cr4);
>  
> +	TEST_ASSERT(!!(sregs.cr4 & X86_CR4_OSXSAVE) ==
> +		    (vcpu->cpuid && vcpu_cpuid_has(vcpu, X86_FEATURE_OSXSAVE)),
> +		    "KVM didn't %s OSXSAVE in CPUID as expected",
> +		    (sregs.cr4 & X86_CR4_OSXSAVE) ? "set" : "clear");
> +
> +	TEST_ASSERT(!!(sregs.cr4 & X86_CR4_PKE) ==
> +		    (vcpu->cpuid && vcpu_cpuid_has(vcpu, X86_FEATURE_OSPKE)),
> +		    "KVM didn't %s OSPKE in CPUID as expected",
> +		    (sregs.cr4 & X86_CR4_PKE) ? "set" : "clear");
> +

Hi,

Just for fun, why not to have a test function that toggles a CR4 bit and then
checks the corresponding CPUID bit toggles as well? This is both better
coverage wise and will remove the above code duplication.

Best regards,
	Maxim Levitsky


>  	vcpu_sregs_get(vcpu, &sregs);
>  	TEST_ASSERT(sregs.cr4 == cr4, "sregs.CR4 (0x%llx) != CR4 (0x%lx)",
>  		    sregs.cr4, cr4);



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 08/49] KVM: x86: Move __kvm_is_valid_cr4() definition to x86.h
  2024-05-17 17:38 ` [PATCH v2 08/49] KVM: x86: Move __kvm_is_valid_cr4() definition to x86.h Sean Christopherson
@ 2024-07-05  1:02   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:02 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Let vendor code inline __kvm_is_valid_cr4() now x86.c's cr4_reserved_bits
> no longer exists, as keeping cr4_reserved_bits local to x86.c was the only
> reason for "hiding" the definition of __kvm_is_valid_cr4().
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/x86.c | 9 ---------
>  arch/x86/kvm/x86.h | 6 +++++-
>  2 files changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3f20de4368a6..2f6dda723005 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1130,15 +1130,6 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv);
>  
> -bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> -{
> -	if (cr4 & vcpu->arch.cr4_guest_rsvd_bits)
> -		return false;
> -
> -	return true;
> -}
> -EXPORT_SYMBOL_GPL(__kvm_is_valid_cr4);
> -
>  static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  {
>  	return __kvm_is_valid_cr4(vcpu, cr4) &&
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index d80a4c6b5a38..4a723705a139 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -491,7 +491,6 @@ static inline void kvm_machine_check(void)
>  void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
>  void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
>  int kvm_spec_ctrl_test_value(u64 value);
> -bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
>  int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
>  			      struct x86_exception *e);
>  int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva);
> @@ -505,6 +504,11 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type);
>  #define  KVM_MSR_RET_INVALID	2	/* in-kernel MSR emulation #GP condition */
>  #define  KVM_MSR_RET_FILTERED	3	/* #GP due to userspace MSR filter */
>  
> +static inline bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> +{
> +	return !(cr4 & vcpu->arch.cr4_guest_rsvd_bits);
> +}
> +
>  #define __cr4_reserved_bits(__cpu_has, __c)             \
>  ({                                                      \
>  	u64 __reserved_bits = CR4_RESERVED_BITS;        \


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 09/49] KVM: x86/pmu: Drop now-redundant refresh() during init()
  2024-05-17 17:38 ` [PATCH v2 09/49] KVM: x86/pmu: Drop now-redundant refresh() during init() Sean Christopherson
@ 2024-07-05  1:02   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:02 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Drop the manual kvm_pmu_refresh() from kvm_pmu_init() now that
> kvm_arch_vcpu_create() performs the refresh via kvm_vcpu_after_set_cpuid().
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/pmu.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index a593b03c9aed..31920dd1aa83 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -797,7 +797,6 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
>  
>  	memset(pmu, 0, sizeof(*pmu));
>  	static_call(kvm_x86_pmu_init)(vcpu);
> -	kvm_pmu_refresh(vcpu);
>  }
>  
>  /* Release perf_events for vPMCs that have been unused for a full time slice.  */

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 10/49] KVM: x86: Drop now-redundant MAXPHYADDR and GPA rsvd bits from vCPU creation
  2024-05-17 17:38 ` [PATCH v2 10/49] KVM: x86: Drop now-redundant MAXPHYADDR and GPA rsvd bits from vCPU creation Sean Christopherson
@ 2024-07-05  1:13   ` Maxim Levitsky
  2024-07-08 19:53     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:13 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Drop the manual initialization of maxphyaddr and reserved_gpa_bits during
> vCPU creation now that kvm_arch_vcpu_create() unconditionally invokes
> kvm_vcpu_after_set_cpuid(), which handles all such CPUID caching.
> 
> None of the helpers between the existing code in kvm_arch_vcpu_create()
> and the call to kvm_vcpu_after_set_cpuid() consume maxphyaddr or
> reserved_gpa_bits (though auditing vmx_vcpu_create() and svm_vcpu_create()
> isn't exactly easy).  And even if that weren't the case, KVM _must_
> refresh any affected state during kvm_vcpu_after_set_cpuid(), e.g. to
> correctly handle KVM_SET_CPUID2.  In other words, this can't introduce a
> new bug, only expose an existing bug (of which there don't appear to be
> any).


IMHO the change is not as bulletproof as claimed:

If some code does access the uninitialized state (e.g vcpu->arch.maxphyaddr which
will be zero, I assume), in between these calls, then even though
later the correct CPUID will be set and should override the incorrect state set earlier, 
the problem *is* that the mentioned code will
have to deal with non architecturally possible value (e.g maxphyaddr == 0)
which might cause a bug in it.

Of course such code currently doesn't exist, so it works but
it can fail in the future.

How about we move the call to kvm_vcpu_after_set_cpuid upward? 

Best regards,
	Maxim Levitsky

> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/x86.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2f6dda723005..bb34891d2f0a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12190,9 +12190,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  		goto free_emulate_ctxt;
>  	}
>  
> -	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
> -	vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu);
> -
>  	vcpu->arch.pat = MSR_IA32_CR_PAT_DEFAULT;
>  
>  	kvm_async_pf_hash_reset(vcpu);



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 11/49] KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation
  2024-05-17 17:38 ` [PATCH v2 11/49] KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after " Sean Christopherson
@ 2024-07-05  1:17   ` Maxim Levitsky
  2024-07-08 19:43     ` Sean Christopherson
  2024-07-12  7:42   ` Xiaoyao Li
  1 sibling, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:17 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Reject KVM_CAP_X86_DISABLE_EXITS if vCPUs have been created, as disabling
> PAUSE/MWAIT/HLT exits after vCPUs have been created is broken and useless,
> e.g. except for PAUSE on SVM, the relevant intercepts aren't updated after
> vCPU creation.  vCPUs may also end up with an inconsistent configuration
> if exits are disabled between creation of multiple vCPUs.

Hi,

I am not sure that PAUSE intercepts are updated either, I wasn't able to find a code
that does this.

I agree with this change, but note that there was some talk on the mailing
list to allow to selectively disable VM exits (e.g PAUSE, MWAIT, ...) only on some vCPUs, 
based on the claim that some vCPUs might run RT tasks, while some might be housekeeping.
I haven't followed those discussions closely.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky

> 
> Cc: Hou Wenlong <houwenlong.hwl@antgroup.com>
> Link: https://lore.kernel.org/all/9227068821b275ac547eb2ede09ec65d2281fe07.1680179693.git.houwenlong.hwl@antgroup.com
> Link: https://lore.kernel.org/all/20230121020738.2973-2-kechenl@nvidia.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  Documentation/virt/kvm/api.rst | 1 +
>  arch/x86/kvm/x86.c             | 6 ++++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 6ab8b5b7c64e..884846282d06 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7645,6 +7645,7 @@ branch to guests' 0x200 interrupt vector.
>  :Architectures: x86
>  :Parameters: args[0] defines which exits are disabled
>  :Returns: 0 on success, -EINVAL when args[0] contains invalid exits
> +          or if any vCPUs have already been created
>  
>  Valid bits in args[0] are::
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index bb34891d2f0a..4cb0c150a2f8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6568,6 +6568,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
>  			break;
>  
> +		mutex_lock(&kvm->lock);
> +		if (kvm->created_vcpus)
> +			goto disable_exits_unlock;
> +
>  		if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
>  			kvm->arch.pause_in_guest = true;
>  
> @@ -6589,6 +6593,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		}
>  
>  		r = 0;
> +disable_exits_unlock:
> +		mutex_unlock(&kvm->lock);
>  		break;
>  	case KVM_CAP_MSR_PLATFORM_INFO:
>  		kvm->arch.guest_can_read_msr_platform_info = cap->args[0];



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed
  2024-05-17 17:38 ` [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed Sean Christopherson
  2024-05-22  5:09   ` Binbin Wu
@ 2024-07-05  1:17   ` Maxim Levitsky
  2024-07-12  7:51   ` Xiaoyao Li
  2 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:17 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Reject KVM_CAP_X86_DISABLE_EXITS if userspace attempts to disable MWAIT or
> HLT exits and KVM previously reported (via KVM_CHECK_EXTENSION) that
> disabling the exit(s) is not allowed.  E.g. because MWAIT isn't supported
> or the CPU doesn't have an aways-running APIC timer, or because KVM is
> configured to mitigate cross-thread vulnerabilities.
> 
> Cc: Kechen Lu <kechenl@nvidia.com>
> Fixes: 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT intercepts")
> Fixes: 6f0f2d5ef895 ("KVM: x86: Mitigate the cross-thread return address predictions bug")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/x86.c | 54 ++++++++++++++++++++++++----------------------
>  1 file changed, 28 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4cb0c150a2f8..c729227c6501 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4590,6 +4590,20 @@ static inline bool kvm_can_mwait_in_guest(void)
>  		boot_cpu_has(X86_FEATURE_ARAT);
>  }
>  
> +static u64 kvm_get_allowed_disable_exits(void)
> +{
> +	u64 r = KVM_X86_DISABLE_EXITS_PAUSE;
> +
> +	if (!mitigate_smt_rsb) {
> +		r |= KVM_X86_DISABLE_EXITS_HLT |
> +			KVM_X86_DISABLE_EXITS_CSTATE;
> +
> +		if (kvm_can_mwait_in_guest())
> +			r |= KVM_X86_DISABLE_EXITS_MWAIT;
> +	}
> +	return r;
> +}
> +
>  #ifdef CONFIG_KVM_HYPERV
>  static int kvm_ioctl_get_supported_hv_cpuid(struct kvm_vcpu *vcpu,
>  					    struct kvm_cpuid2 __user *cpuid_arg)
> @@ -4726,15 +4740,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		r = KVM_CLOCK_VALID_FLAGS;
>  		break;
>  	case KVM_CAP_X86_DISABLE_EXITS:
> -		r = KVM_X86_DISABLE_EXITS_PAUSE;
> -
> -		if (!mitigate_smt_rsb) {
> -			r |= KVM_X86_DISABLE_EXITS_HLT |
> -			     KVM_X86_DISABLE_EXITS_CSTATE;
> -
> -			if (kvm_can_mwait_in_guest())
> -				r |= KVM_X86_DISABLE_EXITS_MWAIT;
> -		}
> +		r |= kvm_get_allowed_disable_exits();
>  		break;
>  	case KVM_CAP_X86_SMM:
>  		if (!IS_ENABLED(CONFIG_KVM_SMM))
> @@ -6565,33 +6571,29 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		break;
>  	case KVM_CAP_X86_DISABLE_EXITS:
>  		r = -EINVAL;
> -		if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
> +		if (cap->args[0] & ~kvm_get_allowed_disable_exits())
>  			break;
>  
>  		mutex_lock(&kvm->lock);
>  		if (kvm->created_vcpus)
>  			goto disable_exits_unlock;
>  
> -		if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
> -			kvm->arch.pause_in_guest = true;
> -
>  #define SMT_RSB_MSG "This processor is affected by the Cross-Thread Return Predictions vulnerability. " \
>  		    "KVM_CAP_X86_DISABLE_EXITS should only be used with SMT disabled or trusted guests."
>  
> -		if (!mitigate_smt_rsb) {
> -			if (boot_cpu_has_bug(X86_BUG_SMT_RSB) && cpu_smt_possible() &&
> -			    (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE))
> -				pr_warn_once(SMT_RSB_MSG);
> -
> -			if ((cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT) &&
> -			    kvm_can_mwait_in_guest())
> -				kvm->arch.mwait_in_guest = true;
> -			if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
> -				kvm->arch.hlt_in_guest = true;
> -			if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
> -				kvm->arch.cstate_in_guest = true;
> -		}
> +		if (!mitigate_smt_rsb && boot_cpu_has_bug(X86_BUG_SMT_RSB) &&
> +		    cpu_smt_possible() &&
> +		    (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE))
> +			pr_warn_once(SMT_RSB_MSG);
>  
> +		if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
> +			kvm->arch.pause_in_guest = true;
> +		if (cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT)
> +			kvm->arch.mwait_in_guest = true;
> +		if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
> +			kvm->arch.hlt_in_guest = true;
> +		if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
> +			kvm->arch.cstate_in_guest = true;
>  		r = 0;
>  disable_exits_unlock:
>  		mutex_unlock(&kvm->lock);


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 13/49] KVM: selftests: Fix a bad TEST_REQUIRE() in x86's KVM PV test
  2024-05-17 17:38 ` [PATCH v2 13/49] KVM: selftests: Fix a bad TEST_REQUIRE() in x86's KVM PV test Sean Christopherson
@ 2024-07-05  1:17   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:17 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Actually check for KVM support for disabling HLT-exiting instead of
> effectively checking that KVM_CAP_X86_DISABLE_EXITS is #defined to a
> non-zero value, and convert the TEST_REQUIRE() to a simple return so
> that only the sub-test is skipped if HLT-exiting is mandatory.
> 
> The goof has likely gone unnoticed because all x86 CPUs support disabling
> HLT-exiting, only systems with the opt-in mitigate_smt_rsb KVM module
> param disallow HLT-exiting.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  tools/testing/selftests/kvm/x86_64/kvm_pv_test.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c b/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
> index 78878b3a2725..2aee93108a54 100644
> --- a/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
> @@ -140,10 +140,11 @@ static void test_pv_unhalt(void)
>  	struct kvm_cpuid_entry2 *ent;
>  	u32 kvm_sig_old;
>  
> +	if (!(kvm_check_cap(KVM_CAP_X86_DISABLE_EXITS) & KVM_X86_DISABLE_EXITS_HLT))
> +		return;
> +
>  	pr_info("testing KVM_FEATURE_PV_UNHALT\n");
>  
> -	TEST_REQUIRE(KVM_CAP_X86_DISABLE_EXITS);
> -
>  	/* KVM_PV_UNHALT test */
>  	vm = vm_create_with_one_vcpu(&vcpu, guest_main);
>  	vcpu_set_cpuid_feature(vcpu, X86_FEATURE_KVM_PV_UNHALT);

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 14/49] KVM: selftests: Update x86's KVM PV test to match KVM's disabling exits behavior
  2024-05-17 17:38 ` [PATCH v2 14/49] KVM: selftests: Update x86's KVM PV test to match KVM's disabling exits behavior Sean Christopherson
@ 2024-07-05  1:17   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:17 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Rework x86's KVM PV features test to align with KVM's new, fixed behavior
> of not allowing userspace to disable HLT-exiting after vCPUs have been
> created.  Rework the core testcase to disable HLT-exiting before creating
> a vCPU, and opportunistically modify keep the paired VM+vCPU creation to
> verify that KVM rejects KVM_CAP_X86_DISABLE_EXITS as expected.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  .../selftests/kvm/x86_64/kvm_pv_test.c        | 33 +++++++++++++++++--
>  1 file changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c b/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
> index 2aee93108a54..1b805cbdb47b 100644
> --- a/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/kvm_pv_test.c
> @@ -139,6 +139,7 @@ static void test_pv_unhalt(void)
>  	struct kvm_vm *vm;
>  	struct kvm_cpuid_entry2 *ent;
>  	u32 kvm_sig_old;
> +	int r;
>  
>  	if (!(kvm_check_cap(KVM_CAP_X86_DISABLE_EXITS) & KVM_X86_DISABLE_EXITS_HLT))
>  		return;
> @@ -152,19 +153,45 @@ static void test_pv_unhalt(void)
>  	TEST_ASSERT(vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT),
>  		    "Enabling X86_FEATURE_KVM_PV_UNHALT had no effect");
>  
> -	/* Make sure KVM clears vcpu->arch.kvm_cpuid */
> +	/* Verify KVM disallows disabling exits after vCPU creation. */
> +	r = __vm_enable_cap(vm, KVM_CAP_X86_DISABLE_EXITS, KVM_X86_DISABLE_EXITS_HLT);
> +	TEST_ASSERT(r && errno == EINVAL,
> +		    "Disabling exits after vCPU creation didn't fail as expected");
> +
> +	kvm_vm_free(vm);
> +
> +	/* Verify that KVM clear PV_UNHALT from guest CPUID. */
> +	vm = vm_create(1);
> +	vm_enable_cap(vm, KVM_CAP_X86_DISABLE_EXITS, KVM_X86_DISABLE_EXITS_HLT);
> +
> +	vcpu = vm_vcpu_add(vm, 0, NULL);
> +	TEST_ASSERT(!vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT),
> +		    "vCPU created with PV_UNHALT set by default");
> +
> +	vcpu_set_cpuid_feature(vcpu, X86_FEATURE_KVM_PV_UNHALT);
> +	TEST_ASSERT(!vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT),
> +		    "PV_UNHALT set in guest CPUID when HLT-exiting is disabled");
> +
> +	/*
> +	 * Clobber the KVM PV signature and verify KVM does NOT clear PV_UNHALT
> +	 * when KVM PV is not present, and DOES clear PV_UNHALT when switching
> +	 * back to the correct signature..
> +	 */
>  	ent = vcpu_get_cpuid_entry(vcpu, KVM_CPUID_SIGNATURE);
>  	kvm_sig_old = ent->ebx;
>  	ent->ebx = 0xdeadbeef;
>  	vcpu_set_cpuid(vcpu);
>  
> -	vm_enable_cap(vm, KVM_CAP_X86_DISABLE_EXITS, KVM_X86_DISABLE_EXITS_HLT);
> +	vcpu_set_cpuid_feature(vcpu, X86_FEATURE_KVM_PV_UNHALT);
> +	TEST_ASSERT(vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT),
> +		    "PV_UNHALT cleared when using bogus KVM PV signature");
> +
>  	ent = vcpu_get_cpuid_entry(vcpu, KVM_CPUID_SIGNATURE);
>  	ent->ebx = kvm_sig_old;
>  	vcpu_set_cpuid(vcpu);
>  
>  	TEST_ASSERT(!vcpu_cpuid_has(vcpu, X86_FEATURE_KVM_PV_UNHALT),
> -		    "KVM_FEATURE_PV_UNHALT is set with KVM_CAP_X86_DISABLE_EXITS");
> +		    "PV_UNHALT set in guest CPUID when HLT-exiting is disabled");
>  
>  	/* FIXME: actually test KVM_FEATURE_PV_UNHALT feature */
>  

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 15/49] KVM: x86: Zero out PV features cache when the CPUID leaf is not present
  2024-05-17 17:38 ` [PATCH v2 15/49] KVM: x86: Zero out PV features cache when the CPUID leaf is not present Sean Christopherson
@ 2024-07-05  1:17   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:17 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Clear KVM's PV feature cache prior when processing a new guest CPUID so
> that KVM doesn't keep a stale cache entry if userspace does KVM_SET_CPUID2
> multiple times, once with a PV features entry, and a second time without.
> 
> Fixes: 66570e966dd9 ("kvm: x86: only provide PV features if enabled in guest's CPUID")
> Cc: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index f756a91a3f2f..be1c8f43e090 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -246,6 +246,8 @@ void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpuid_entry2 *best = kvm_find_kvm_cpuid_features(vcpu);
>  
> +	vcpu->arch.pv_cpuid.features = 0;
> +
>  	/*
>  	 * save the feature bitmap to avoid cpuid lookup for every PV
>  	 * operation

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 16/49] KVM: x86: Don't update PV features caches when enabling enforcement capability
  2024-05-17 17:38 ` [PATCH v2 16/49] KVM: x86: Don't update PV features caches when enabling enforcement capability Sean Christopherson
@ 2024-07-05  1:17   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:17 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Revert the chunk of commit 01b4f510b9f4 ("kvm: x86: ensure pv_cpuid.features
> is initialized when enabling cap") that forced a PV features cache refresh
> during KVM_CAP_ENFORCE_PV_FEATURE_CPUID, as whatever ioctl() ordering
> issue it alleged to have fixed never existed upstream, and likely never
> existed in any kernel.
> 
> At the time of the commit, there was a tangentially related ioctl()
> ordering issue, as toggling KVM_X86_DISABLE_EXITS_HLT after KVM_SET_CPUID2
> would have resulted in KVM potentially leaving KVM_FEATURE_PV_UNHALT set.
> But (a) that bug affected the entire guest CPUID, not just the cache, (b)
> commit 01b4f510b9f4 didn't address that bug, it only refreshed the cache
> (with the bad CPUID), and (c) setting KVM_X86_DISABLE_EXITS_HLT after vCPU
> creation is completely broken as KVM configures HLT-exiting only during
> vCPU creation, which is why KVM_CAP_X86_DISABLE_EXITS is now disallowed if
> vCPUs have been created.
> 
> Another tangentially related bug was KVM's failure to clear the cache when
> handling KVM_SET_CPUID2, but again commit 01b4f510b9f4 did nothing to fix
> that bug.
> 
> The most plausible explanation for the what commit 01b4f510b9f4 was trying
> to fix is a bug that existed in Google's internal kernel that was the
> source of commit 01b4f510b9f4.  At the time, Google's internal kernel had
> not yet picked up commit 0d3b2ba16ba68 ("KVM: X86: Go on updating other
> CPUID leaves when leaf 1 is absent"), i.e. KVM would not initialize the
> PV features cache if KVM_SET_CPUID2 was called without a CPUID.0x1 entry.
> 
> Of course, no sane real world VMM would omit CPUID.0x1, including the KVM
> selftest added by commit ac4a4d6de22e ("selftests: kvm: test enforcement
> of paravirtual cpuid features").  And the test didn't actually try to
> verify multiple orderings, nor did the selftest enter the guest without
> doing KVM_SET_CPUID2, so who knows what motivated the change.
> 
> Regardless of why commit 01b4f510b9f4 ("kvm: x86: ensure pv_cpuid.features
> is initialized when enabling cap") was added, refreshing the cache during
> KVM_CAP_ENFORCE_PV_FEATURE_CPUID isn't necessary.
> 
> Cc: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 2 +-
>  arch/x86/kvm/cpuid.h | 1 -
>  arch/x86/kvm/x86.c   | 3 ---
>  3 files changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index be1c8f43e090..a51e48663f53 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -242,7 +242,7 @@ static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcp
>  					     vcpu->arch.cpuid_nent, base);
>  }
>  
> -void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
> +static void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpuid_entry2 *best = kvm_find_kvm_cpuid_features(vcpu);
>  
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index 0a8b561b5434..7eb3d7318fc4 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -13,7 +13,6 @@ void kvm_set_cpu_caps(void);
>  
>  void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
>  void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
> -void kvm_update_pv_runtime(struct kvm_vcpu *vcpu);
>  struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
>  						    u32 function, u32 index);
>  struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c729227c6501..7160c5ab8e3e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5849,9 +5849,6 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
>  
>  	case KVM_CAP_ENFORCE_PV_FEATURE_CPUID:
>  		vcpu->arch.pv_cpuid.enforce = cap->args[0];
> -		if (vcpu->arch.pv_cpuid.enforce)
> -			kvm_update_pv_runtime(vcpu);
> -
>  		return 0;
>  	default:
>  		return -EINVAL;

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 17/49] KVM: x86: Do reverse CPUID sanity checks in __feature_leaf()
  2024-05-17 17:38 ` [PATCH v2 17/49] KVM: x86: Do reverse CPUID sanity checks in __feature_leaf() Sean Christopherson
@ 2024-07-05  1:17   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:17 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Do the compile-time sanity checks on reverse_cpuid in __feature_leaf() so
> that higher level APIs don't need to "manually" perform the sanity checks.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.h         | 3 ---
>  arch/x86/kvm/reverse_cpuid.h | 6 ++++--
>  2 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index 7eb3d7318fc4..d68b7d879820 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -198,7 +198,6 @@ static __always_inline void kvm_cpu_cap_clear(unsigned int x86_feature)
>  {
>  	unsigned int x86_leaf = __feature_leaf(x86_feature);
>  
> -	reverse_cpuid_check(x86_leaf);
>  	kvm_cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature);
>  }
>  
> @@ -206,7 +205,6 @@ static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature)
>  {
>  	unsigned int x86_leaf = __feature_leaf(x86_feature);
>  
> -	reverse_cpuid_check(x86_leaf);
>  	kvm_cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
>  }
>  
> @@ -214,7 +212,6 @@ static __always_inline u32 kvm_cpu_cap_get(unsigned int x86_feature)
>  {
>  	unsigned int x86_leaf = __feature_leaf(x86_feature);
>  
> -	reverse_cpuid_check(x86_leaf);
>  	return kvm_cpu_caps[x86_leaf] & __feature_bit(x86_feature);
>  }
>  
> diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
> index 2f4e155080ba..245f71c16272 100644
> --- a/arch/x86/kvm/reverse_cpuid.h
> +++ b/arch/x86/kvm/reverse_cpuid.h
> @@ -136,7 +136,10 @@ static __always_inline u32 __feature_translate(int x86_feature)
>  
>  static __always_inline u32 __feature_leaf(int x86_feature)
>  {
> -	return __feature_translate(x86_feature) / 32;
> +	u32 x86_leaf = __feature_translate(x86_feature) / 32;
> +
> +	reverse_cpuid_check(x86_leaf);
> +	return x86_leaf;
>  }
>  
>  /*
> @@ -159,7 +162,6 @@ static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned int x86_featu
>  {
>  	unsigned int x86_leaf = __feature_leaf(x86_feature);
>  
> -	reverse_cpuid_check(x86_leaf);
>  	return reverse_cpuid[x86_leaf];
>  }
>  

Makes sense.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 18/49] KVM: x86: Account for max supported CPUID leaf when getting raw host CPUID
  2024-05-17 17:38 ` [PATCH v2 18/49] KVM: x86: Account for max supported CPUID leaf when getting raw host CPUID Sean Christopherson
  2024-06-19  6:17   ` Yang, Weijiang
@ 2024-07-05  1:17   ` Maxim Levitsky
  1 sibling, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:17 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Explicitly zero out the feature word in kvm_cpu_caps if the word's
> associated CPUID function is greater than the max leaf supported by the
> CPU.  For such unsupported functions, Intel CPUs return the output from
> the last supported leaf, not all zeros.
> 
> Practically speaking, this is likely a benign bug, as KVM uses the raw
> host CPUID to mask the kernel's computed capabilities, and the kernel does
> perform max leaf checks when populating boot_cpu_data.  The only way KVM's
> goof could be problematic is if the kernel force-set a feature in a leaf
> that is completely unsupported, _and_ the max supported leaf happened to
> return a value with '1' the same bit position.  Which is theoretically
> possible, but extremely unlikely.  And even if that did happen, it's
> entirely possible that KVM would still provide the correct functionality;
> the kernel did set the capability after all.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 29 ++++++++++++++++++++++++-----
>  1 file changed, 24 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index a51e48663f53..77625a5477b1 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -571,18 +571,37 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
>  	return 0;
>  }
>  
> +static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid)
> +{
> +	struct kvm_cpuid_entry2 entry;
> +	u32 base;
> +
> +	/*
> +	 * KVM only supports features defined by Intel (0x0), AMD (0x80000000),
> +	 * and Centaur (0xc0000000).  WARN if a feature for new vendor base is
> +	 * defined, as this and other code would need to be updated.
> +	 */
> +	base = cpuid.function & 0xffff0000;
> +	if (WARN_ON_ONCE(base && base != 0x80000000 && base != 0xc0000000))
> +		return 0;
> +
> +	if (cpuid_eax(base) < cpuid.function)
> +		return 0;
> +
> +	cpuid_count(cpuid.function, cpuid.index,
> +		    &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);
> +
> +	return *__cpuid_entry_get_reg(&entry, cpuid.reg);
> +}
> +
>  /* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */
>  static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
>  {
>  	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
> -	struct kvm_cpuid_entry2 entry;
>  
>  	reverse_cpuid_check(leaf);
>  
> -	cpuid_count(cpuid.function, cpuid.index,
> -		    &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);
> -
> -	kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);
> +	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
>  }
>  
>  static __always_inline

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 19/49] KVM: x86: Add a macro to init CPUID features that ignore host kernel support
  2024-05-17 17:38 ` [PATCH v2 19/49] KVM: x86: Add a macro to init CPUID features that ignore host kernel support Sean Christopherson
@ 2024-07-05  1:21   ` Maxim Levitsky
  2024-07-08 20:53     ` Sean Christopherson
  2024-07-08 22:36     ` Sean Christopherson
  0 siblings, 2 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:21 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Add a macro for use in kvm_set_cpu_caps() to automagically initialize
> features that KVM wants to support based solely on the CPU's capabilities,
> e.g. KVM advertises LA57 support if it's available in hardware, even if
> the host kernel isn't utilizing 57-bit virtual addresses.
> 
> Take advantage of the fact that kvm_cpu_cap_mask() adjusts kvm_cpu_caps
> based on raw CPUID, i.e. will clear features bits that aren't supported in
> hardware, and simply force-set the capability before applying the mask.
> 
> Abusing kvm_cpu_cap_set() is a borderline evil shenanigan, but doing so
> avoid extra CPUID lookups, and a future commit will harden the entire
> family of *F() macros to assert (at compile time) that every feature being
> allowed is part of the capability word being processed, i.e. using a macro
> will bring more advantages in the future.

Could you explain what do you mean by "extra CPUID lookups"?


> 
> Avoiding CPUID also fixes a largely benign bug where KVM could incorrectly
> report LA57 support on Intel CPUs whose max supported CPUID is less than 7,
> i.e. if the max supported leaf (<7) happened to have bit 16 set.  In
> practice, barring a funky virtual machine setup, the bug is benign as all
> known CPUs that support VMX also support leaf 7.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 77625a5477b1..a802c09b50ab 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -70,6 +70,18 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
>  	(boot_cpu_has(X86_FEATURE_##name) ? F(name) : 0);	\
>  })
>  
> +/*
> + * Raw Feature - For features that KVM supports based purely on raw host CPUID,
> + * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
> + * Simply force set the feature in KVM's capabilities, raw CPUID support will
> + * be factored in by kvm_cpu_cap_mask().
> + */
> +#define RAW_F(name)						\
> +({								\
> +	kvm_cpu_cap_set(X86_FEATURE_##name);			\
> +	F(name);						\
> +})
> +
>  /*
>   * Magic value used by KVM when querying userspace-provided CPUID entries and
>   * doesn't care about the CPIUD index because the index of the function in
> @@ -682,15 +694,12 @@ void kvm_set_cpu_caps(void)
>  		F(AVX512VL));
>  
>  	kvm_cpu_cap_mask(CPUID_7_ECX,
> -		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
> +		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
>  		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
>  		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
>  		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
>  		F(SGX_LC) | F(BUS_LOCK_DETECT)
>  	);
> -	/* Set LA57 based on hardware capability. */
> -	if (cpuid_ecx(7) & F(LA57))
> -		kvm_cpu_cap_set(X86_FEATURE_LA57);
>  
>  	/*
>  	 * PKU not yet implemented for shadow paging and requires OSPKE

Putting a function call into a macro which evaluates into a bitmask is somewhat misleading,
but let it be...

IMHO in long term, it might be better to rip the whole huge 'or'ed mess, and replace
it with a list of statements, along with comments for all unusual cases.


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 20/49] KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init()
  2024-05-17 17:38 ` [PATCH v2 20/49] KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init() Sean Christopherson
  2024-05-22  6:23   ` Binbin Wu
@ 2024-07-05  1:24   ` Maxim Levitsky
  1 sibling, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:24 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init() in anticipation of merging
> it with kvm_cpu_cap_init_kvm_defined(), and in anticipation of _setting_
> bits in the helper (a future commit will play macro games to set emulated
> feature flags via kvm_cpu_cap_init()).
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 36 ++++++++++++++++++------------------
>  1 file changed, 18 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index a802c09b50ab..5a4d6138c4f1 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -74,7 +74,7 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
>   * Raw Feature - For features that KVM supports based purely on raw host CPUID,
>   * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
>   * Simply force set the feature in KVM's capabilities, raw CPUID support will
> - * be factored in by kvm_cpu_cap_mask().
> + * be factored in by __kvm_cpu_cap_mask().
>   */
>  #define RAW_F(name)						\
>  ({								\
> @@ -619,7 +619,7 @@ static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
>  static __always_inline
>  void kvm_cpu_cap_init_kvm_defined(enum kvm_only_cpuid_leafs leaf, u32 mask)
>  {
> -	/* Use kvm_cpu_cap_mask for leafs that aren't KVM-only. */
> +	/* Use kvm_cpu_cap_init for leafs that aren't KVM-only. */
>  	BUILD_BUG_ON(leaf < NCAPINTS);
>  
>  	kvm_cpu_caps[leaf] = mask;
> @@ -627,7 +627,7 @@ void kvm_cpu_cap_init_kvm_defined(enum kvm_only_cpuid_leafs leaf, u32 mask)
>  	__kvm_cpu_cap_mask(leaf);
>  }
>  
> -static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
> +static __always_inline void kvm_cpu_cap_init(enum cpuid_leafs leaf, u32 mask)
>  {
>  	/* Use kvm_cpu_cap_init_kvm_defined for KVM-only leafs. */
>  	BUILD_BUG_ON(leaf >= NCAPINTS);
> @@ -656,7 +656,7 @@ void kvm_set_cpu_caps(void)
>  	memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
>  	       sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)));
>  
> -	kvm_cpu_cap_mask(CPUID_1_ECX,
> +	kvm_cpu_cap_init(CPUID_1_ECX,
>  		/*
>  		 * NOTE: MONITOR (and MWAIT) are emulated as NOP, but *not*
>  		 * advertised to guests via CPUID!
> @@ -673,7 +673,7 @@ void kvm_set_cpu_caps(void)
>  	/* KVM emulates x2apic in software irrespective of host support. */
>  	kvm_cpu_cap_set(X86_FEATURE_X2APIC);
>  
> -	kvm_cpu_cap_mask(CPUID_1_EDX,
> +	kvm_cpu_cap_init(CPUID_1_EDX,
>  		F(FPU) | F(VME) | F(DE) | F(PSE) |
>  		F(TSC) | F(MSR) | F(PAE) | F(MCE) |
>  		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SEP) |
> @@ -684,7 +684,7 @@ void kvm_set_cpu_caps(void)
>  		0 /* HTT, TM, Reserved, PBE */
>  	);
>  
> -	kvm_cpu_cap_mask(CPUID_7_0_EBX,
> +	kvm_cpu_cap_init(CPUID_7_0_EBX,
>  		F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) |
>  		F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) | F(INVPCID) |
>  		F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ | F(AVX512F) |
> @@ -693,7 +693,7 @@ void kvm_set_cpu_caps(void)
>  		F(AVX512ER) | F(AVX512CD) | F(SHA_NI) | F(AVX512BW) |
>  		F(AVX512VL));
>  
> -	kvm_cpu_cap_mask(CPUID_7_ECX,
> +	kvm_cpu_cap_init(CPUID_7_ECX,
>  		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
>  		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
>  		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> @@ -708,7 +708,7 @@ void kvm_set_cpu_caps(void)
>  	if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE))
>  		kvm_cpu_cap_clear(X86_FEATURE_PKU);
>  
> -	kvm_cpu_cap_mask(CPUID_7_EDX,
> +	kvm_cpu_cap_init(CPUID_7_EDX,
>  		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
>  		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
>  		F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
> @@ -727,7 +727,7 @@ void kvm_set_cpu_caps(void)
>  	if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
>  		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);
>  
> -	kvm_cpu_cap_mask(CPUID_7_1_EAX,
> +	kvm_cpu_cap_init(CPUID_7_1_EAX,
>  		F(AVX_VNNI) | F(AVX512_BF16) | F(CMPCCXADD) |
>  		F(FZRM) | F(FSRS) | F(FSRC) |
>  		F(AMX_FP16) | F(AVX_IFMA) | F(LAM)
> @@ -743,7 +743,7 @@ void kvm_set_cpu_caps(void)
>  		F(BHI_CTRL) | F(MCDT_NO)
>  	);
>  
> -	kvm_cpu_cap_mask(CPUID_D_1_EAX,
> +	kvm_cpu_cap_init(CPUID_D_1_EAX,
>  		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) | f_xfd
>  	);
>  
> @@ -751,7 +751,7 @@ void kvm_set_cpu_caps(void)
>  		SF(SGX1) | SF(SGX2) | SF(SGX_EDECCSSA)
>  	);
>  
> -	kvm_cpu_cap_mask(CPUID_8000_0001_ECX,
> +	kvm_cpu_cap_init(CPUID_8000_0001_ECX,
>  		F(LAHF_LM) | F(CMP_LEGACY) | 0 /*SVM*/ | 0 /* ExtApicSpace */ |
>  		F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
>  		F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) |
> @@ -759,7 +759,7 @@ void kvm_set_cpu_caps(void)
>  		F(TOPOEXT) | 0 /* PERFCTR_CORE */
>  	);
>  
> -	kvm_cpu_cap_mask(CPUID_8000_0001_EDX,
> +	kvm_cpu_cap_init(CPUID_8000_0001_EDX,
>  		F(FPU) | F(VME) | F(DE) | F(PSE) |
>  		F(TSC) | F(MSR) | F(PAE) | F(MCE) |
>  		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
> @@ -777,7 +777,7 @@ void kvm_set_cpu_caps(void)
>  		SF(CONSTANT_TSC)
>  	);
>  
> -	kvm_cpu_cap_mask(CPUID_8000_0008_EBX,
> +	kvm_cpu_cap_init(CPUID_8000_0008_EBX,
>  		F(CLZERO) | F(XSAVEERPTR) |
>  		F(WBNOINVD) | F(AMD_IBPB) | F(AMD_IBRS) | F(AMD_SSBD) | F(VIRT_SSBD) |
>  		F(AMD_SSB_NO) | F(AMD_STIBP) | F(AMD_STIBP_ALWAYS_ON) |
> @@ -811,13 +811,13 @@ void kvm_set_cpu_caps(void)
>  	 * Hide all SVM features by default, SVM will set the cap bits for
>  	 * features it emulates and/or exposes for L1.
>  	 */
> -	kvm_cpu_cap_mask(CPUID_8000_000A_EDX, 0);
> +	kvm_cpu_cap_init(CPUID_8000_000A_EDX, 0);
>  
> -	kvm_cpu_cap_mask(CPUID_8000_001F_EAX,
> +	kvm_cpu_cap_init(CPUID_8000_001F_EAX,
>  		0 /* SME */ | 0 /* SEV */ | 0 /* VM_PAGE_FLUSH */ | 0 /* SEV_ES */ |
>  		F(SME_COHERENT));
>  
> -	kvm_cpu_cap_mask(CPUID_8000_0021_EAX,
> +	kvm_cpu_cap_init(CPUID_8000_0021_EAX,
>  		F(NO_NESTED_DATA_BP) | F(LFENCE_RDTSC) | 0 /* SmmPgCfgLock */ |
>  		F(NULL_SEL_CLR_BASE) | F(AUTOIBRS) | 0 /* PrefetchCtlMsr */ |
>  		F(WRMSR_XX_BASE_NS)
> @@ -837,7 +837,7 @@ void kvm_set_cpu_caps(void)
>  	 * kernel.  LFENCE_RDTSC was a Linux-defined synthetic feature long
>  	 * before AMD joined the bandwagon, e.g. LFENCE is serializing on most
>  	 * CPUs that support SSE2.  On CPUs that don't support AMD's leaf,
> -	 * kvm_cpu_cap_mask() will unfortunately drop the flag due to ANDing
> +	 * kvm_cpu_cap_init() will unfortunately drop the flag due to ANDing
>  	 * the mask with the raw host CPUID, and reporting support in AMD's
>  	 * leaf can make it easier for userspace to detect the feature.
>  	 */
> @@ -847,7 +847,7 @@ void kvm_set_cpu_caps(void)
>  		kvm_cpu_cap_set(X86_FEATURE_NULL_SEL_CLR_BASE);
>  	kvm_cpu_cap_set(X86_FEATURE_NO_SMM_CTL_MSR);
>  
> -	kvm_cpu_cap_mask(CPUID_C000_0001_EDX,
> +	kvm_cpu_cap_init(CPUID_C000_0001_EDX,
>  		F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) |
>  		F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
>  		F(PMM) | F(PMM_EN)

Hi,

Not really sure if we need this patch, I see that this patch helped
with renaming things, but IMHO it can be squashed with the relevant patches.

But anyway,

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 21/49] KVM: x86: Add a macro to init CPUID features that are 64-bit only
  2024-05-17 17:38 ` [PATCH v2 21/49] KVM: x86: Add a macro to init CPUID features that are 64-bit only Sean Christopherson
@ 2024-07-05  1:24   ` Maxim Levitsky
  2024-07-17 13:31   ` Xiaoyao Li
  1 sibling, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:24 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Add a macro to mask-in feature flags that are supported only on 64-bit
> kernels/KVM.  In addition to reducing overall #ifdeffery, using a macro
> will allow hardening the kvm_cpu_cap initialization sequences to assert
> that the features being advertised are indeed included in the word being
> initialized.  And arguably using *F() macros through is more readable.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 22 ++++++++++------------
>  1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 5a4d6138c4f1..5e3b97d06374 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -70,6 +70,12 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
>  	(boot_cpu_has(X86_FEATURE_##name) ? F(name) : 0);	\
>  })
>  
> +/* Features that KVM supports only on 64-bit kernels. */
> +#define X86_64_F(name)						\
> +({								\
> +	(IS_ENABLED(CONFIG_X86_64) ? F(name) : 0);		\
> +})
> +
>  /*
>   * Raw Feature - For features that KVM supports based purely on raw host CPUID,
>   * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
> @@ -639,15 +645,6 @@ static __always_inline void kvm_cpu_cap_init(enum cpuid_leafs leaf, u32 mask)
>  
>  void kvm_set_cpu_caps(void)
>  {
> -#ifdef CONFIG_X86_64
> -	unsigned int f_gbpages = F(GBPAGES);
> -	unsigned int f_lm = F(LM);
> -	unsigned int f_xfd = F(XFD);
> -#else
> -	unsigned int f_gbpages = 0;
> -	unsigned int f_lm = 0;
> -	unsigned int f_xfd = 0;
> -#endif
>  	memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));
>  
>  	BUILD_BUG_ON(sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)) >
> @@ -744,7 +741,8 @@ void kvm_set_cpu_caps(void)
>  	);
>  
>  	kvm_cpu_cap_init(CPUID_D_1_EAX,
> -		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) | f_xfd
> +		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) |
> +		X86_64_F(XFD)
>  	);
>  
>  	kvm_cpu_cap_init_kvm_defined(CPUID_12_EAX,
> @@ -766,8 +764,8 @@ void kvm_set_cpu_caps(void)
>  		F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
>  		F(PAT) | F(PSE36) | 0 /* Reserved */ |
>  		F(NX) | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
> -		F(FXSR) | F(FXSR_OPT) | f_gbpages | F(RDTSCP) |
> -		0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW)
> +		F(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
> +		0 /* Reserved */ | X86_64_F(LM) | F(3DNOWEXT) | F(3DNOW)
>  	);
>  
>  	if (!tdp_enabled && IS_ENABLED(CONFIG_X86_64))

This is a good cleanup.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features
  2024-05-17 17:38 ` [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features Sean Christopherson
@ 2024-07-05  1:25   ` Maxim Levitsky
  2024-07-08 21:08     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:25 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> Add a macro to precisely handle CPUID features that AMD duplicated from
> CPUID.0x1.EDX into CPUID.0x8000_0001.EDX.  This will allow adding an
> assert that all features passed to kvm_cpu_cap_init() match the word being
> processed, e.g. to prevent passing a feature from CPUID 0x7 to CPUID 0x1.
> 
> Because the kernel simply reuses the X86_FEATURE_* definitions from
> CPUID.0x1.EDX, KVM's use of the aliased features would result in false
> positives from such an assert.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 24 +++++++++++++++++-------
>  1 file changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 5e3b97d06374..f2bd2f5c4ea3 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -88,6 +88,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
>  	F(name);						\
>  })
>  
> +/*
> + * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
> + * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
> + */
> +#define AF(name)								\
> +({										\
> +	BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);	\
> +	feature_bit(name);							\
> +})
> +
>  /*
>   * Magic value used by KVM when querying userspace-provided CPUID entries and
>   * doesn't care about the CPIUD index because the index of the function in
> @@ -758,13 +768,13 @@ void kvm_set_cpu_caps(void)
>  	);
>  
>  	kvm_cpu_cap_init(CPUID_8000_0001_EDX,
> -		F(FPU) | F(VME) | F(DE) | F(PSE) |
> -		F(TSC) | F(MSR) | F(PAE) | F(MCE) |
> -		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
> -		F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
> -		F(PAT) | F(PSE36) | 0 /* Reserved */ |
> -		F(NX) | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
> -		F(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
> +		AF(FPU) | AF(VME) | AF(DE) | AF(PSE) |
> +		AF(TSC) | AF(MSR) | AF(PAE) | AF(MCE) |
> +		AF(CX8) | AF(APIC) | 0 /* Reserved */ | F(SYSCALL) |
> +		AF(MTRR) | AF(PGE) | AF(MCA) | AF(CMOV) |
> +		AF(PAT) | AF(PSE36) | 0 /* Reserved */ |
> +		F(NX) | 0 /* Reserved */ | F(MMXEXT) | AF(MMX) |
> +		AF(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
>  		0 /* Reserved */ | X86_64_F(LM) | F(3DNOWEXT) | F(3DNOW)
>  	);
>  

Hi,

What if we defined the aliased features instead.
Something like this:

#define __X86_FEATURE_8000_0001_ALIAS(feature) \
	(feature + (CPUID_8000_0001_EDX - CPUID_1_EDX) * 32)

#define KVM_X86_FEATURE_FPU_ALIAS	__X86_FEATURE_8000_0001_ALIAS(KVM_X86_FEATURE_FPU)
#define KVM_X86_FEATURE_VME_ALIAS	__X86_FEATURE_8000_0001_ALIAS(KVM_X86_FEATURE_VME)

And then just use for example the 'F(FPU_ALIAS)' in the CPUID_8000_0001_EDX


Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 23/49] KVM: x86: Handle kernel- and KVM-defined CPUID words in a single helper
  2024-05-17 17:39 ` [PATCH v2 23/49] KVM: x86: Handle kernel- and KVM-defined CPUID words in a single helper Sean Christopherson
@ 2024-07-05  1:28   ` Maxim Levitsky
  2024-07-08 21:18     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:28 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Merge kvm_cpu_cap_init() and kvm_cpu_cap_init_kvm_defined() into a single
> helper.  The only advantage of separating the two was to make it somewhat
> obvious that KVM directly initializes the KVM-defined words, whereas using
> a common helper will allow for hardening both kernel- and KVM-defined
> CPUID words without needing copy+paste.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 44 +++++++++++++++-----------------------------
>  1 file changed, 15 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index f2bd2f5c4ea3..8efffd48cdf1 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -622,37 +622,23 @@ static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid)
>  	return *__cpuid_entry_get_reg(&entry, cpuid.reg);
>  }
>  
> -/* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */
> -static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
> +static __always_inline void kvm_cpu_cap_init(u32 leaf, u32 mask)
>  {
>  	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
>  
> -	reverse_cpuid_check(leaf);
> +	/*
> +	 * For kernel-defined leafs, mask the boot CPU's pre-populated value.
> +	 * For KVM-defined leafs, explicitly set the leaf, as KVM is the one
> +	 * and only authority.
> +	 */
> +	if (leaf < NCAPINTS)
> +		kvm_cpu_caps[leaf] &= mask;
> +	else
> +		kvm_cpu_caps[leaf] = mask;

Hi,

I have an idea,
how about we just initialize the kvm only leafs to 0xFFFFFFFF and then treat them exactly in the same way as kernel regular leafs?

Then the user won't have to figure out (assuming that the user doesn't read the comment, who does?) why we use mask as init value.

But if you prefer to leave it this way, I won't object either.

Best regards,
	Maxim Levitsky



>  
>  	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
>  }
>  
> -static __always_inline
> -void kvm_cpu_cap_init_kvm_defined(enum kvm_only_cpuid_leafs leaf, u32 mask)
> -{
> -	/* Use kvm_cpu_cap_init for leafs that aren't KVM-only. */
> -	BUILD_BUG_ON(leaf < NCAPINTS);
> -
> -	kvm_cpu_caps[leaf] = mask;
> -
> -	__kvm_cpu_cap_mask(leaf);
> -}
> -
> -static __always_inline void kvm_cpu_cap_init(enum cpuid_leafs leaf, u32 mask)
> -{
> -	/* Use kvm_cpu_cap_init_kvm_defined for KVM-only leafs. */
> -	BUILD_BUG_ON(leaf >= NCAPINTS);
> -
> -	kvm_cpu_caps[leaf] &= mask;
> -
> -	__kvm_cpu_cap_mask(leaf);
> -}
> -
>  void kvm_set_cpu_caps(void)
>  {
>  	memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));
> @@ -740,12 +726,12 @@ void kvm_set_cpu_caps(void)
>  		F(AMX_FP16) | F(AVX_IFMA) | F(LAM)
>  	);
>  
> -	kvm_cpu_cap_init_kvm_defined(CPUID_7_1_EDX,
> +	kvm_cpu_cap_init(CPUID_7_1_EDX,
>  		F(AVX_VNNI_INT8) | F(AVX_NE_CONVERT) | F(PREFETCHITI) |
>  		F(AMX_COMPLEX)
>  	);
>  
> -	kvm_cpu_cap_init_kvm_defined(CPUID_7_2_EDX,
> +	kvm_cpu_cap_init(CPUID_7_2_EDX,
>  		F(INTEL_PSFD) | F(IPRED_CTRL) | F(RRSBA_CTRL) | F(DDPD_U) |
>  		F(BHI_CTRL) | F(MCDT_NO)
>  	);
> @@ -755,7 +741,7 @@ void kvm_set_cpu_caps(void)
>  		X86_64_F(XFD)
>  	);
>  
> -	kvm_cpu_cap_init_kvm_defined(CPUID_12_EAX,
> +	kvm_cpu_cap_init(CPUID_12_EAX,
>  		SF(SGX1) | SF(SGX2) | SF(SGX_EDECCSSA)
>  	);
>  
> @@ -781,7 +767,7 @@ void kvm_set_cpu_caps(void)
>  	if (!tdp_enabled && IS_ENABLED(CONFIG_X86_64))
>  		kvm_cpu_cap_set(X86_FEATURE_GBPAGES);
>  
> -	kvm_cpu_cap_init_kvm_defined(CPUID_8000_0007_EDX,
> +	kvm_cpu_cap_init(CPUID_8000_0007_EDX,
>  		SF(CONSTANT_TSC)
>  	);
>  
> @@ -835,7 +821,7 @@ void kvm_set_cpu_caps(void)
>  	kvm_cpu_cap_check_and_set(X86_FEATURE_IBPB_BRTYPE);
>  	kvm_cpu_cap_check_and_set(X86_FEATURE_SRSO_NO);
>  
> -	kvm_cpu_cap_init_kvm_defined(CPUID_8000_0022_EAX,
> +	kvm_cpu_cap_init(CPUID_8000_0022_EAX,
>  		F(PERFMON_V2)
>  	);
>  



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 24/49] KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions
  2024-05-17 17:39 ` [PATCH v2 24/49] KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions Sean Christopherson
@ 2024-07-05  1:30   ` Maxim Levitsky
  2024-07-08 21:29     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:30 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Undefine SPEC_CTRL_SSBD, which is #defined by msr-index.h to represent the
> enable flag in MSR_IA32_SPEC_CTRL, to avoid issues with the macro being
> unpacked into its raw value when passed to KVM's F() macro.  This will
> allow using multiple layers of macros in F() and friends, e.g. to harden
> against incorrect usage of F().
> 
> No functional change intended (cpuid.c doesn't consume SPEC_CTRL_SSBD).
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 8efffd48cdf1..a16d6e070c11 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -639,6 +639,12 @@ static __always_inline void kvm_cpu_cap_init(u32 leaf, u32 mask)
>  	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
>  }
>  
> +/*
> + * Undefine the MSR bit macro to avoid token concatenation issues when
> + * processing X86_FEATURE_SPEC_CTRL_SSBD.
> + */
> +#undef SPEC_CTRL_SSBD
> +
>  void kvm_set_cpu_caps(void)
>  {
>  	memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));

Hi,

Maybe we should instead rename the 
SPEC_CTRL_SSBD to 'MSR_IA32_SPEC_CTRL_SSBD' and together with it, other fields of this msr.
It seems that at least some msrs in this file do this.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 25/49] KVM: x86: Harden CPU capabilities processing against out-of-scope features
  2024-05-17 17:39 ` [PATCH v2 25/49] KVM: x86: Harden CPU capabilities processing against out-of-scope features Sean Christopherson
@ 2024-07-05  1:31   ` Maxim Levitsky
  2024-07-09 18:11     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:31 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Add compile-time assertions to verify that usage of F() and friends in
> kvm_set_cpu_caps() is scoped to the correct CPUID word, e.g. to detect
> bugs where KVM passes a feature bit from word X into word y.
> 
> Add a one-off assertion in the aliased feature macro to ensure that only
> word 0x8000_0001.EDX aliased the features defined for 0x1.EDX.
> 
> To do so, convert kvm_cpu_cap_init() to a macro and have it define a
> local variable to track which CPUID word is being initialized that is
> then used to validate usage of F() (all of the inputs are compile-time
> constants and thus can be fed into BUILD_BUG_ON()).
> 
> Redefine KVM_VALIDATE_CPU_CAP_USAGE after kvm_set_cpu_caps() to be a nop
> so that F() can be used in other flows that aren't as easily hardened,
> e.g. __do_cpuid_func_emulated() and __do_cpuid_func().
> 
> Invoke KVM_VALIDATE_CPU_CAP_USAGE() in SF() and X86_64_F() to ensure the
> validation occurs, e.g. if the usage of F() is completely compiled out
> (which shouldn't happen for boot_cpu_has(), but could happen in the future,
> e.g. if KVM were to use cpu_feature_enabled()).
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 55 +++++++++++++++++++++++++++++++-------------
>  1 file changed, 39 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index a16d6e070c11..1064e4d68718 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -61,18 +61,24 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
>  	return ret;
>  }
>  
> -#define F feature_bit
> +#define F(name)							\
> +({								\
> +	KVM_VALIDATE_CPU_CAP_USAGE(name);			\
> +	feature_bit(name);					\
> +})
>  
>  /* Scattered Flag - For features that are scattered by cpufeatures.h. */
>  #define SF(name)						\
>  ({								\
>  	BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES);	\
> +	KVM_VALIDATE_CPU_CAP_USAGE(name);			\
>  	(boot_cpu_has(X86_FEATURE_##name) ? F(name) : 0);	\
>  })
>  
>  /* Features that KVM supports only on 64-bit kernels. */
>  #define X86_64_F(name)						\
>  ({								\
> +	KVM_VALIDATE_CPU_CAP_USAGE(name);			\
>  	(IS_ENABLED(CONFIG_X86_64) ? F(name) : 0);		\
>  })
>  
> @@ -95,6 +101,7 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
>  #define AF(name)								\
>  ({										\
>  	BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);	\
> +	BUILD_BUG_ON(kvm_cpu_cap_init_in_progress != CPUID_8000_0001_EDX);	\
>  	feature_bit(name);							\
>  })
>  
> @@ -622,22 +629,34 @@ static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid)
>  	return *__cpuid_entry_get_reg(&entry, cpuid.reg);
>  }
>  
> -static __always_inline void kvm_cpu_cap_init(u32 leaf, u32 mask)
> -{
> -	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
> +/*
> + * Assert that the feature bit being declared, e.g. via F(), is in the CPUID
> + * word that's being initialized.  Exempt 0x8000_0001.EDX usage of 0x1.EDX
> + * features, as AMD duplicated many 0x1.EDX features into 0x8000_0001.EDX.
> + */
> +#define KVM_VALIDATE_CPU_CAP_USAGE(name)				\
> +do {									\
> +	u32 __leaf = __feature_leaf(X86_FEATURE_##name);		\
> +									\
> +	BUILD_BUG_ON(__leaf != kvm_cpu_cap_init_in_progress);		\
> +} while (0)
>  
> -	/*
> -	 * For kernel-defined leafs, mask the boot CPU's pre-populated value.
> -	 * For KVM-defined leafs, explicitly set the leaf, as KVM is the one
> -	 * and only authority.
> -	 */
> -	if (leaf < NCAPINTS)
> -		kvm_cpu_caps[leaf] &= mask;
> -	else
> -		kvm_cpu_caps[leaf] = mask;
> -
> -	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
> -}
> +/*
> + * For kernel-defined leafs, mask the boot CPU's pre-populated value.  For KVM-
> + * defined leafs, explicitly set the leaf, as KVM is the one and only authority.
> + */
> +#define kvm_cpu_cap_init(leaf, mask)					\
> +do {									\
> +	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
> +	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\

Why not to #define the kvm_cpu_cap_init_in_progress as well instead of a variable?

> +									\
> +	if (leaf < NCAPINTS)						\
> +		kvm_cpu_caps[leaf] &= (mask);				\
> +	else								\
> +		kvm_cpu_caps[leaf] = (mask);				\
> +									\
> +	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);			\
> +} while (0)
>  
>  /*
>   * Undefine the MSR bit macro to avoid token concatenation issues when
> @@ -870,6 +889,10 @@ void kvm_set_cpu_caps(void)
>  }
>  EXPORT_SYMBOL_GPL(kvm_set_cpu_caps);
>  
> +#undef kvm_cpu_cap_init
> +#undef KVM_VALIDATE_CPU_CAP_USAGE
> +#define KVM_VALIDATE_CPU_CAP_USAGE(name)
> +
>  struct kvm_cpuid_array {
>  	struct kvm_cpuid_entry2 *entries;
>  	int maxnent;


Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 27/49] KVM: x86: Swap incoming guest CPUID into vCPU before massaging in KVM_SET_CPUID2
  2024-05-17 17:39 ` [PATCH v2 27/49] KVM: x86: Swap incoming guest CPUID into vCPU before massaging in KVM_SET_CPUID2 Sean Christopherson
@ 2024-07-05  1:32   ` Maxim Levitsky
  2024-07-08 21:37     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:32 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> When handling KVM_SET_CPUID{,2}, swap the old and new CPUID arrays and
> lengths before processing the new CPUID, and simply undo the swap if
> setting the new CPUID fails for whatever reason.
> 
> To keep the diff reasonable, continue passing the entry array and length
> to most helpers, and defer the more complete cleanup to future commits.
> 
> For any sane VMM, setting "bad" CPUID state is not a hot path (or even
> something that is surviable), and setting guest CPUID before it's known
> good will allow removing all of KVM's infrastructure for processing CPUID
> entries directly (as opposed to operating on vcpu->arch.cpuid_entries).
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 49 +++++++++++++++++++++++++++-----------------
>  1 file changed, 30 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 33e3e77de1b7..4ad01867cb8d 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -175,10 +175,10 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
>  	return NULL;
>  }
>  
> -static int kvm_check_cpuid(struct kvm_vcpu *vcpu,
> -			   struct kvm_cpuid_entry2 *entries,
> -			   int nent)
> +static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
>  {
> +	struct kvm_cpuid_entry2 *entries = vcpu->arch.cpuid_entries;
> +	int nent = vcpu->arch.cpuid_nent;
>  	struct kvm_cpuid_entry2 *best;
>  	u64 xfeatures;
>  
> @@ -369,9 +369,11 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);
>  
> -static bool kvm_cpuid_has_hyperv(struct kvm_cpuid_entry2 *entries, int nent)
> +static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
>  {
>  #ifdef CONFIG_KVM_HYPERV
> +	struct kvm_cpuid_entry2 *entries = vcpu->arch.cpuid_entries;
> +	int nent = vcpu->arch.cpuid_nent;
>  	struct kvm_cpuid_entry2 *entry;
>  
>  	entry = cpuid_entry2_find(entries, nent, HYPERV_CPUID_INTERFACE,
> @@ -436,8 +438,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  					 __cr4_reserved_bits(guest_cpuid_has, vcpu);
>  #undef __kvm_cpu_cap_has
>  
> -	kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu->arch.cpuid_entries,
> -						    vcpu->arch.cpuid_nent));
> +	kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu));
>  
>  	/* Invoke the vendor callback only after the above state is updated. */
>  	static_call(kvm_x86_vcpu_after_set_cpuid)(vcpu);
> @@ -478,6 +479,15 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
>  {
>  	int r;
>  
> +	/*
> +	 * Swap the existing (old) entries with the incoming (new) entries in
> +	 * order to massage the new entries, e.g. to account for dynamic bits
> +	 * that KVM controls, without clobbering the current guest CPUID, which
> +	 * KVM needs to preserve in order to unwind on failure.
> +	 */
> +	swap(vcpu->arch.cpuid_entries, e2);
> +	swap(vcpu->arch.cpuid_nent, nent);
> +
>  	/*
>  	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
>  	 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
> @@ -497,31 +507,25 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
>  		 * only because any change in CPUID is disallowed, i.e. using
>  		 * stale data is ok because KVM will reject the change.
>  		 */
> -		__kvm_update_cpuid_runtime(vcpu, e2, nent);
> +		kvm_update_cpuid_runtime(vcpu);
>  
>  		r = kvm_cpuid_check_equal(vcpu, e2, nent);
>  		if (r)
> -			return r;
> -
> -		kvfree(e2);
> -		return 0;
> +			goto err;
> +		goto success;
>  	}
>  
>  #ifdef CONFIG_KVM_HYPERV
> -	if (kvm_cpuid_has_hyperv(e2, nent)) {
> +	if (kvm_cpuid_has_hyperv(vcpu)) {
>  		r = kvm_hv_vcpu_init(vcpu);
>  		if (r)
> -			return r;
> +			goto err;
>  	}
>  #endif
>  
> -	r = kvm_check_cpuid(vcpu, e2, nent);
> +	r = kvm_check_cpuid(vcpu);
>  	if (r)
> -		return r;
> -
> -	kvfree(vcpu->arch.cpuid_entries);
> -	vcpu->arch.cpuid_entries = e2;
> -	vcpu->arch.cpuid_nent = nent;
> +		goto err;
>  
>  	vcpu->arch.kvm_cpuid = kvm_get_hypervisor_cpuid(vcpu, KVM_SIGNATURE);
>  #ifdef CONFIG_KVM_XEN
> @@ -529,7 +533,14 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
>  #endif
>  	kvm_vcpu_after_set_cpuid(vcpu);
>  
> +success:
> +	kvfree(e2);
>  	return 0;
> +
> +err:
> +	swap(vcpu->arch.cpuid_entries, e2);
> +	swap(vcpu->arch.cpuid_nent, nent);
> +	return r;
>  }
>  
>  /* when an old userspace process fills a new kernel module */

Hi,

This IMHO is a good idea. You might consider moving this patch to the beginning of the patch series though,
it will make more sense with the rest of the patches there.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 28/49] KVM: x86: Clear PV_UNHALT for !HLT-exiting only when userspace sets CPUID
  2024-05-17 17:39 ` [PATCH v2 28/49] KVM: x86: Clear PV_UNHALT for !HLT-exiting only when userspace sets CPUID Sean Christopherson
@ 2024-07-05  1:32   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:32 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Now that KVM disallows disabling HLT-exiting after vCPUs have been created,
> i.e. now that it's impossible for kvm_hlt_in_guest() to change while vCPUs
> are running, apply KVM's PV_UNHALT quirk only when userspace is setting
> guest CPUID.
> 
> Opportunistically rename the helper to make it clear that KVM's behavior
> is a quirk that should never have been added.  KVM's documentation
> explicitly states that userspace should not advertise PV_UNHALT if
> HLT-exiting is disabled, but for unknown reasons, commit caa057a2cad6
> ("KVM: X86: Provide a capability to disable HLT intercepts") didn't stop
> at documenting the requirement and also massaged the incoming guest CPUID.
> 
> Unfortunately, it's quite likely that userspace has come to rely on KVM's
> behavior, i.e. the code can't simply be deleted.  The only reason KVM
> doesn't have an "official" quirk is that there is no known use case where
> disabling the quirk would make sense, i.e. letting userspace disable the
> quirk would further increase KVM's burden without any benefit.

Makes sense overall.


> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 26 +++++++++-----------------
>  1 file changed, 9 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 4ad01867cb8d..93a7399dc0db 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -287,18 +287,17 @@ static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcp
>  					     vcpu->arch.cpuid_nent, base);
>  }
>  
> -static void kvm_update_pv_runtime(struct kvm_vcpu *vcpu)
> +static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpuid_entry2 *best = kvm_find_kvm_cpuid_features(vcpu);
>  
> -	vcpu->arch.pv_cpuid.features = 0;
> +	if (!best)
> +		return 0;
>  
> -	/*
> -	 * save the feature bitmap to avoid cpuid lookup for every PV
> -	 * operation
> -	 */
> -	if (best)
> -		vcpu->arch.pv_cpuid.features = best->eax;
> +	if (kvm_hlt_in_guest(vcpu->kvm))
> +		best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
> +
> +	return best->eax;
>  }
>  
>  /*
> @@ -320,7 +319,6 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
>  				       int nent)
>  {
>  	struct kvm_cpuid_entry2 *best;
> -	struct kvm_hypervisor_cpuid kvm_cpuid;
>  
>  	best = cpuid_entry2_find(entries, nent, 1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
>  	if (best) {
> @@ -347,13 +345,6 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
>  		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
>  		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
>  
> -	kvm_cpuid = __kvm_get_hypervisor_cpuid(entries, nent, KVM_SIGNATURE);
> -	if (kvm_cpuid.base) {
> -		best = __kvm_find_kvm_cpuid_features(entries, nent, kvm_cpuid.base);
> -		if (kvm_hlt_in_guest(vcpu->kvm) && best)
> -			best->eax &= ~(1 << KVM_FEATURE_PV_UNHALT);
> -	}
> -
>  	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
>  		best = cpuid_entry2_find(entries, nent, 0x1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
>  		if (best)
> @@ -425,7 +416,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	vcpu->arch.guest_supported_xcr0 =
>  		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
>  
> -	kvm_update_pv_runtime(vcpu);
> +	vcpu->arch.pv_cpuid.features = kvm_apply_cpuid_pv_features_quirk(vcpu);
>  
>  	vcpu->arch.is_amd_compatible = guest_cpuid_is_amd_or_hygon(vcpu);
>  	vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
> @@ -508,6 +499,7 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
>  		 * stale data is ok because KVM will reject the change.
>  		 */
>  		kvm_update_cpuid_runtime(vcpu);
> +		kvm_apply_cpuid_pv_features_quirk(vcpu);
>  
>  		r = kvm_cpuid_check_equal(vcpu, e2, nent);
>  		if (r)

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 29/49] KVM: x86: Remove unnecessary caching of KVM's PV CPUID base
  2024-05-17 17:39 ` [PATCH v2 29/49] KVM: x86: Remove unnecessary caching of KVM's PV CPUID base Sean Christopherson
@ 2024-07-05  1:51   ` Maxim Levitsky
  2024-07-09 19:00     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Now that KVM only searches for KVM's PV CPUID base when userspace sets
> guest CPUID, drop the cache and simply do the search every time.
> 
> Practically speaking, this is a nop except for situations where userspace
> sets CPUID _after_ running the vCPU, which is anything but a hot path,
> e.g. QEMU does so only when hotplugging a vCPU.  And on the flip side,
> caching guest CPUID information, especially information that is used to
> query/modify _other_ CPUID state, is inherently dangerous as it's all too
> easy to use stale information, i.e. KVM should only cache CPUID state when
> the performance and/or programming benefits justify it.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 -
>  arch/x86/kvm/cpuid.c            | 34 +++++++--------------------------
>  2 files changed, 7 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index aabf1648a56a..3003e99155e7 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -858,7 +858,6 @@ struct kvm_vcpu_arch {
>  
>  	int cpuid_nent;
>  	struct kvm_cpuid_entry2 *cpuid_entries;
> -	struct kvm_hypervisor_cpuid kvm_cpuid;
>  	bool is_amd_compatible;
>  
>  	/*
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 93a7399dc0db..7290f91c422c 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -269,28 +269,16 @@ static struct kvm_hypervisor_cpuid kvm_get_hypervisor_cpuid(struct kvm_vcpu *vcp
>  					  vcpu->arch.cpuid_nent, sig);
>  }
>  
> -static struct kvm_cpuid_entry2 *__kvm_find_kvm_cpuid_features(struct kvm_cpuid_entry2 *entries,
> -							      int nent, u32 kvm_cpuid_base)
> -{
> -	return cpuid_entry2_find(entries, nent, kvm_cpuid_base | KVM_CPUID_FEATURES,
> -				 KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> -}
> -
> -static struct kvm_cpuid_entry2 *kvm_find_kvm_cpuid_features(struct kvm_vcpu *vcpu)
> -{
> -	u32 base = vcpu->arch.kvm_cpuid.base;
> -
> -	if (!base)
> -		return NULL;
> -
> -	return __kvm_find_kvm_cpuid_features(vcpu->arch.cpuid_entries,
> -					     vcpu->arch.cpuid_nent, base);
> -}
> -
>  static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm_cpuid_entry2 *best = kvm_find_kvm_cpuid_features(vcpu);
> +	struct kvm_hypervisor_cpuid kvm_cpuid;
> +	struct kvm_cpuid_entry2 *best;
>  
> +	kvm_cpuid = kvm_get_hypervisor_cpuid(vcpu, KVM_SIGNATURE);
> +	if (!kvm_cpuid.base)
> +		return 0;
> +
> +	best = kvm_find_cpuid_entry(vcpu, kvm_cpuid.base | KVM_CPUID_FEATURES);
>  	if (!best)
>  		return 0;
>  
> @@ -491,13 +479,6 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
>  	 * whether the supplied CPUID data is equal to what's already set.
>  	 */
>  	if (kvm_vcpu_has_run(vcpu)) {
> -		/*
> -		 * Note, runtime CPUID updates may consume other CPUID-driven
> -		 * vCPU state, e.g. KVM or Xen CPUID bases.  Updating runtime
> -		 * state before full CPUID processing is functionally correct
> -		 * only because any change in CPUID is disallowed, i.e. using
> -		 * stale data is ok because KVM will reject the change.
> -		 */
Hi,

Any reason why this comment was removed? As I said earlier in the review.
It might make sense to replace this comment with a comment reflecting on why
we need to call kvm_update_cpuid_runtime, that is solely to allow old == new
compare to succeed.

>  		kvm_update_cpuid_runtime(vcpu);
>  		kvm_apply_cpuid_pv_features_quirk(vcpu);
>  
> @@ -519,7 +500,6 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
>  	if (r)
>  		goto err;
>  
> -	vcpu->arch.kvm_cpuid = kvm_get_hypervisor_cpuid(vcpu, KVM_SIGNATURE);
>  #ifdef CONFIG_KVM_XEN
>  	vcpu->arch.xen.cpuid = kvm_get_hypervisor_cpuid(vcpu, XEN_SIGNATURE);
>  #endif



Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 30/49] KVM: x86: Always operate on kvm_vcpu data in cpuid_entry2_find()
  2024-05-17 17:39 ` [PATCH v2 30/49] KVM: x86: Always operate on kvm_vcpu data in cpuid_entry2_find() Sean Christopherson
@ 2024-07-05  1:51   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Now that KVM sets vcpu->arch.cpuid_{entries,nent} before processing the
> incoming CPUID entries during KVM_SET_CPUID{,2}, drop the @entries and
> @nent params from cpuid_entry2_find() and unconditionally operate on the
> vCPU state.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 62 +++++++++++++++-----------------------------
>  1 file changed, 21 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 7290f91c422c..0526f25a7c80 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -124,8 +124,8 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
>   */
>  #define KVM_CPUID_INDEX_NOT_SIGNIFICANT -1ull
>  
> -static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
> -	struct kvm_cpuid_entry2 *entries, int nent, u32 function, u64 index)
> +static struct kvm_cpuid_entry2 *cpuid_entry2_find(struct kvm_vcpu *vcpu,
> +						  u32 function, u64 index)
>  {
>  	struct kvm_cpuid_entry2 *e;
>  	int i;
> @@ -142,8 +142,8 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
>  	 */
>  	lockdep_assert_irqs_enabled();
>  
> -	for (i = 0; i < nent; i++) {
> -		e = &entries[i];
> +	for (i = 0; i < vcpu->arch.cpuid_nent; i++) {
> +		e = &vcpu->arch.cpuid_entries[i];
>  
>  		if (e->function != function)
>  			continue;
> @@ -177,8 +177,6 @@ static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
>  
>  static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm_cpuid_entry2 *entries = vcpu->arch.cpuid_entries;
> -	int nent = vcpu->arch.cpuid_nent;
>  	struct kvm_cpuid_entry2 *best;
>  	u64 xfeatures;
>  
> @@ -186,7 +184,7 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
>  	 * The existing code assumes virtual address is 48-bit or 57-bit in the
>  	 * canonical address checks; exit if it is ever changed.
>  	 */
> -	best = cpuid_entry2_find(entries, nent, 0x80000008,
> +	best = cpuid_entry2_find(vcpu, 0x80000008,
>  				 KVM_CPUID_INDEX_NOT_SIGNIFICANT);
>  	if (best) {
>  		int vaddr_bits = (best->eax & 0xff00) >> 8;
> @@ -199,7 +197,7 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
>  	 * Exposing dynamic xfeatures to the guest requires additional
>  	 * enabling in the FPU, e.g. to expand the guest XSAVE state size.
>  	 */
> -	best = cpuid_entry2_find(entries, nent, 0xd, 0);
> +	best = cpuid_entry2_find(vcpu, 0xd, 0);
>  	if (!best)
>  		return 0;
>  
> @@ -234,15 +232,15 @@ static int kvm_cpuid_check_equal(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2
>  	return 0;
>  }
>  
> -static struct kvm_hypervisor_cpuid __kvm_get_hypervisor_cpuid(struct kvm_cpuid_entry2 *entries,
> -							      int nent, const char *sig)
> +static struct kvm_hypervisor_cpuid kvm_get_hypervisor_cpuid(struct kvm_vcpu *vcpu,
> +							    const char *sig)
>  {
>  	struct kvm_hypervisor_cpuid cpuid = {};
>  	struct kvm_cpuid_entry2 *entry;
>  	u32 base;
>  
>  	for_each_possible_hypervisor_cpuid_base(base) {
> -		entry = cpuid_entry2_find(entries, nent, base, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> +		entry = cpuid_entry2_find(vcpu, base, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
>  
>  		if (entry) {
>  			u32 signature[3];
> @@ -262,13 +260,6 @@ static struct kvm_hypervisor_cpuid __kvm_get_hypervisor_cpuid(struct kvm_cpuid_e
>  	return cpuid;
>  }
>  
> -static struct kvm_hypervisor_cpuid kvm_get_hypervisor_cpuid(struct kvm_vcpu *vcpu,
> -							    const char *sig)
> -{
> -	return __kvm_get_hypervisor_cpuid(vcpu->arch.cpuid_entries,
> -					  vcpu->arch.cpuid_nent, sig);
> -}
> -
>  static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_hypervisor_cpuid kvm_cpuid;
> @@ -292,23 +283,22 @@ static u32 kvm_apply_cpuid_pv_features_quirk(struct kvm_vcpu *vcpu)
>   * Calculate guest's supported XCR0 taking into account guest CPUID data and
>   * KVM's supported XCR0 (comprised of host's XCR0 and KVM_SUPPORTED_XCR0).
>   */
> -static u64 cpuid_get_supported_xcr0(struct kvm_cpuid_entry2 *entries, int nent)
> +static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpuid_entry2 *best;
>  
> -	best = cpuid_entry2_find(entries, nent, 0xd, 0);
> +	best = cpuid_entry2_find(vcpu, 0xd, 0);
>  	if (!best)
>  		return 0;
>  
>  	return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
>  }
>  
> -static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *entries,
> -				       int nent)
> +void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpuid_entry2 *best;
>  
> -	best = cpuid_entry2_find(entries, nent, 1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> +	best = cpuid_entry2_find(vcpu, 1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
>  	if (best) {
>  		/* Update OSXSAVE bit */
>  		if (boot_cpu_has(X86_FEATURE_XSAVE))
> @@ -319,43 +309,36 @@ static void __kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu, struct kvm_cpuid_e
>  			   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
>  	}
>  
> -	best = cpuid_entry2_find(entries, nent, 7, 0);
> +	best = cpuid_entry2_find(vcpu, 7, 0);
>  	if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7)
>  		cpuid_entry_change(best, X86_FEATURE_OSPKE,
>  				   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
>  
> -	best = cpuid_entry2_find(entries, nent, 0xD, 0);
> +	best = cpuid_entry2_find(vcpu, 0xD, 0);
>  	if (best)
>  		best->ebx = xstate_required_size(vcpu->arch.xcr0, false);
>  
> -	best = cpuid_entry2_find(entries, nent, 0xD, 1);
> +	best = cpuid_entry2_find(vcpu, 0xD, 1);
>  	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
>  		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
>  		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
>  
>  	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
> -		best = cpuid_entry2_find(entries, nent, 0x1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> +		best = cpuid_entry2_find(vcpu, 0x1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
>  		if (best)
>  			cpuid_entry_change(best, X86_FEATURE_MWAIT,
>  					   vcpu->arch.ia32_misc_enable_msr &
>  					   MSR_IA32_MISC_ENABLE_MWAIT);
>  	}
>  }
> -
> -void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
> -{
> -	__kvm_update_cpuid_runtime(vcpu, vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
> -}
>  EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);
>  
>  static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
>  {
>  #ifdef CONFIG_KVM_HYPERV
> -	struct kvm_cpuid_entry2 *entries = vcpu->arch.cpuid_entries;
> -	int nent = vcpu->arch.cpuid_nent;
>  	struct kvm_cpuid_entry2 *entry;
>  
> -	entry = cpuid_entry2_find(entries, nent, HYPERV_CPUID_INTERFACE,
> +	entry = cpuid_entry2_find(vcpu, HYPERV_CPUID_INTERFACE,
>  				  KVM_CPUID_INDEX_NOT_SIGNIFICANT);
>  	return entry && entry->eax == HYPERV_CPUID_SIGNATURE_EAX;
>  #else
> @@ -401,8 +384,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  		kvm_apic_set_version(vcpu);
>  	}
>  
> -	vcpu->arch.guest_supported_xcr0 =
> -		cpuid_get_supported_xcr0(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
> +	vcpu->arch.guest_supported_xcr0 = cpuid_get_supported_xcr0(vcpu);
>  
>  	vcpu->arch.pv_cpuid.features = kvm_apply_cpuid_pv_features_quirk(vcpu);
>  
> @@ -1532,16 +1514,14 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
>  struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
>  						    u32 function, u32 index)
>  {
> -	return cpuid_entry2_find(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent,
> -				 function, index);
> +	return cpuid_entry2_find(vcpu, function, index);
>  }
>  EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry_index);
>  
>  struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
>  					      u32 function)
>  {
> -	return cpuid_entry2_find(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent,
> -				 function, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> +	return cpuid_entry2_find(vcpu, function, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
>  }
>  EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);
>  

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 31/49] KVM: x86: Move kvm_find_cpuid_entry{,_index}() up near cpuid_entry2_find()
  2024-05-17 17:39 ` [PATCH v2 31/49] KVM: x86: Move kvm_find_cpuid_entry{,_index}() up near cpuid_entry2_find() Sean Christopherson
@ 2024-07-05  1:51   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Move kvm_find_cpuid_entry{,_index}() "up" in cpuid.c so that they are
> colocated with cpuid_entry2_find(), e.g. to make it easier to see the
> effective guts of the helpers without having to bounce around cpuid.c.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 28 ++++++++++++++--------------
>  1 file changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 0526f25a7c80..d7390ade1c29 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -175,6 +175,20 @@ static struct kvm_cpuid_entry2 *cpuid_entry2_find(struct kvm_vcpu *vcpu,
>  	return NULL;
>  }
>  
> +struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
> +						    u32 function, u32 index)
> +{
> +	return cpuid_entry2_find(vcpu, function, index);
> +}
> +EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry_index);
> +
> +struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
> +					      u32 function)
> +{
> +	return cpuid_entry2_find(vcpu, function, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> +}
> +EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);
> +
>  static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpuid_entry2 *best;
> @@ -1511,20 +1525,6 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
>  	return r;
>  }
>  
> -struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
> -						    u32 function, u32 index)
> -{
> -	return cpuid_entry2_find(vcpu, function, index);
> -}
> -EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry_index);
> -
> -struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
> -					      u32 function)
> -{
> -	return cpuid_entry2_find(vcpu, function, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> -}
> -EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);
> -
>  /*
>   * Intel CPUID semantics treats any query for an out-of-range leaf as if the
>   * highest basic leaf (i.e. CPUID.0H:EAX) were requested.  AMD CPUID semantics

Makes sense.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 32/49] KVM: x86: Remove all direct usage of cpuid_entry2_find()
  2024-05-17 17:39 ` [PATCH v2 32/49] KVM: x86: Remove all direct usage of cpuid_entry2_find() Sean Christopherson
@ 2024-07-05  1:52   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:52 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Convert all use of cpuid_entry2_find() to kvm_find_cpuid_entry{,index}()
> now that cpuid_entry2_find() operates on the vCPU state, i.e. now that
> there is no need to use cpuid_entry2_find() directly in order to pass in
> non-vCPU state.
> 
> To help prevent unwanted usage of cpuid_entry2_find(), #undef
> KVM_CPUID_INDEX_NOT_SIGNIFICANT, i.e. force KVM to use
> kvm_find_cpuid_entry().
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 28 ++++++++++++++++------------
>  1 file changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index d7390ade1c29..699ce4261e9c 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -189,6 +189,12 @@ struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
>  }
>  EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);
>  
> +/*
> + * cpuid_entry2_find() and KVM_CPUID_INDEX_NOT_SIGNIFICANT should never be used
> + * directly outside of kvm_find_cpuid_entry() and kvm_find_cpuid_entry_index().
> + */
> +#undef KVM_CPUID_INDEX_NOT_SIGNIFICANT
> +
>  static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpuid_entry2 *best;
> @@ -198,8 +204,7 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
>  	 * The existing code assumes virtual address is 48-bit or 57-bit in the
>  	 * canonical address checks; exit if it is ever changed.
>  	 */
> -	best = cpuid_entry2_find(vcpu, 0x80000008,
> -				 KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> +	best = kvm_find_cpuid_entry(vcpu, 0x80000008);
>  	if (best) {
>  		int vaddr_bits = (best->eax & 0xff00) >> 8;
>  
> @@ -211,7 +216,7 @@ static int kvm_check_cpuid(struct kvm_vcpu *vcpu)
>  	 * Exposing dynamic xfeatures to the guest requires additional
>  	 * enabling in the FPU, e.g. to expand the guest XSAVE state size.
>  	 */
> -	best = cpuid_entry2_find(vcpu, 0xd, 0);
> +	best = kvm_find_cpuid_entry_index(vcpu, 0xd, 0);
>  	if (!best)
>  		return 0;
>  
> @@ -254,7 +259,7 @@ static struct kvm_hypervisor_cpuid kvm_get_hypervisor_cpuid(struct kvm_vcpu *vcp
>  	u32 base;
>  
>  	for_each_possible_hypervisor_cpuid_base(base) {
> -		entry = cpuid_entry2_find(vcpu, base, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> +		entry = kvm_find_cpuid_entry(vcpu, base);
>  
>  		if (entry) {
>  			u32 signature[3];
> @@ -301,7 +306,7 @@ static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpuid_entry2 *best;
>  
> -	best = cpuid_entry2_find(vcpu, 0xd, 0);
> +	best = kvm_find_cpuid_entry_index(vcpu, 0xd, 0);
>  	if (!best)
>  		return 0;
>  
> @@ -312,7 +317,7 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpuid_entry2 *best;
>  
> -	best = cpuid_entry2_find(vcpu, 1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> +	best = kvm_find_cpuid_entry(vcpu, 1);
>  	if (best) {
>  		/* Update OSXSAVE bit */
>  		if (boot_cpu_has(X86_FEATURE_XSAVE))
> @@ -323,22 +328,22 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
>  			   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
>  	}
>  
> -	best = cpuid_entry2_find(vcpu, 7, 0);
> +	best = kvm_find_cpuid_entry_index(vcpu, 7, 0);
>  	if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7)
>  		cpuid_entry_change(best, X86_FEATURE_OSPKE,
>  				   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
>  
> -	best = cpuid_entry2_find(vcpu, 0xD, 0);
> +	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 0);
>  	if (best)
>  		best->ebx = xstate_required_size(vcpu->arch.xcr0, false);
>  
> -	best = cpuid_entry2_find(vcpu, 0xD, 1);
> +	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 1);
>  	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
>  		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
>  		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
>  
>  	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
> -		best = cpuid_entry2_find(vcpu, 0x1, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> +		best = kvm_find_cpuid_entry(vcpu, 0x1);
>  		if (best)
>  			cpuid_entry_change(best, X86_FEATURE_MWAIT,
>  					   vcpu->arch.ia32_misc_enable_msr &
> @@ -352,8 +357,7 @@ static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
>  #ifdef CONFIG_KVM_HYPERV
>  	struct kvm_cpuid_entry2 *entry;
>  
> -	entry = cpuid_entry2_find(vcpu, HYPERV_CPUID_INTERFACE,
> -				  KVM_CPUID_INDEX_NOT_SIGNIFICANT);
> +	entry = kvm_find_cpuid_entry(vcpu, HYPERV_CPUID_INTERFACE);
>  	return entry && entry->eax == HYPERV_CPUID_SIGNATURE_EAX;
>  #else
>  	return false;

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software
  2024-05-17 17:39 ` [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software Sean Christopherson
@ 2024-07-05  1:59   ` Maxim Levitsky
  2024-07-08 22:30     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  1:59 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Now that kvm_cpu_cap_init() is a macro with its own scope, add EMUL_F() to
> OR-in features that KVM emulates in software, i.e. that don't depend on
> the feature being available in hardware.  The contained scope
> of kvm_cpu_cap_init() allows using a local variable to track the set of
> emulated leaves, which in addition to avoiding confusing and/or
> unnecessary variables, helps prevent misuse of EMUL_F().
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 36 +++++++++++++++++++++---------------
>  1 file changed, 21 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 1064e4d68718..33e3e77de1b7 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -94,6 +94,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
>  	F(name);						\
>  })
>  
> +/*
> + * Emulated Feature - For features that KVM emulates in software irrespective
> + * of host CPU/kernel support.
> + */
> +#define EMUL_F(name)						\
> +({								\
> +	kvm_cpu_cap_emulated |= F(name);			\
> +	F(name);						\
> +})

To me it feels more and more that this patch series doesn't go into the right direction.

How about we just abandon the whole concept of masks and instead just have a list of statements

Pretty much the opposite of the patch series I confess:


#define CAP_PASSTHOUGH		0x01
#define CAP_EMULATED		0x02
#define CAP_AMD_ALIASED		0x04 // for AMD aliased features
#define CAP_SCATTERED		0x08
#define CAP_X86_64		0x10 // supported only on 64 bit hypervisors
...


/* CPUID_1_ECX*/

				/* TMA is not passed though because: xyz*/
kvm_cpu_cap_init(TMA,           0);

kvm_cpu_cap_init(SSSE3,         CAP_PASSTHOUGH);
				/* CNXT_ID is not passed though because: xyz*/
kvm_cpu_cap_init(CNXT_ID,       0);
kvm_cpu_cap_init(RESERVED,      0);
kvm_cpu_cap_init(FMA,           CAP_PASSTHOUGH);
...
				/* KVM always emulates TSC_ADJUST */
kvm_cpu_cap_init(TSC_ADJUST,    CAP_EMULATED | CAP_SCATTERED);

...

/* CPUID_D_1_EAX*/
				/* XFD is disabled on 32 bit systems because: xyz*/
kvm_cpu_cap_init(XFD, 		CAP_PASSTHOUGH | CAP_X86_64)


'kvm_cpu_cap_init' can be a macro if needed to have the compile checks.

There are several advantages to this:

- more readability, plus if needed each statement can be amended with a comment.
- No weird hacks in 'F*' macros, which additionally eventually evaluate into a bit, which is confusing.
  In fact no need to even have them at all.
- No need to verify that bitmask belongs to a feature word.
- Merge friendly - each capability has its own line.

Disadvantages:

- Longer list - IMHO not a problem, since it is very easy to read / search
  and can have as much comments as needed.
  For example this is how the kernel lists the CPUID features and this list IMHO
  is very manageable.

- Slower - kvm_set_cpu_caps is called exactly once per KVM module load, thus
  performance is the last thing I would care about in this function.


Another note about this patch: It is somewhat confusing that EMUL_F just forces a feature in kvm caps,
regardless of CPU support, because KVM also has KVM_GET_EMULATED_CPUID and it has a different meaning.

Users can easily confuse the EMUL_F for something that sets a feature bit in the KVM_GET_EMULATED_CPUID.


Best regards,
	Maxim Levitsky



> +
>  /*
>   * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
>   * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
> @@ -649,6 +659,7 @@ do {									\
>  do {									\
>  	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
>  	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
> +	u32 kvm_cpu_cap_emulated = 0;					\
>  									\
>  	if (leaf < NCAPINTS)						\
>  		kvm_cpu_caps[leaf] &= (mask);				\
> @@ -656,6 +667,7 @@ do {									\
>  		kvm_cpu_caps[leaf] = (mask);				\
>  									\
>  	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);			\
> +	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
>  } while (0)
>  
>  /*
> @@ -684,12 +696,10 @@ void kvm_set_cpu_caps(void)
>  		0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
>  		F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
>  		F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
> -		F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
> +		F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
>  		0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
>  		F(F16C) | F(RDRAND)
>  	);
> -	/* KVM emulates x2apic in software irrespective of host support. */
> -	kvm_cpu_cap_set(X86_FEATURE_X2APIC);
>  
>  	kvm_cpu_cap_init(CPUID_1_EDX,
>  		F(FPU) | F(VME) | F(DE) | F(PSE) |
> @@ -703,13 +713,13 @@ void kvm_set_cpu_caps(void)
>  	);
>  
>  	kvm_cpu_cap_init(CPUID_7_0_EBX,
> -		F(FSGSBASE) | F(SGX) | F(BMI1) | F(HLE) | F(AVX2) |
> -		F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) | F(INVPCID) |
> -		F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ | F(AVX512F) |
> -		F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) | F(AVX512IFMA) |
> -		F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ | F(AVX512PF) |
> -		F(AVX512ER) | F(AVX512CD) | F(SHA_NI) | F(AVX512BW) |
> -		F(AVX512VL));
> +		F(FSGSBASE) | EMUL_F(TSC_ADJUST) | F(SGX) | F(BMI1) | F(HLE) |
> +		F(AVX2) | F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) |
> +		F(INVPCID) | F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ |
> +		F(AVX512F) | F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) |
> +		F(AVX512IFMA) | F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ |
> +		F(AVX512PF) | F(AVX512ER) | F(AVX512CD) | F(SHA_NI) |
> +		F(AVX512BW) | F(AVX512VL));
>  
>  	kvm_cpu_cap_init(CPUID_7_ECX,
>  		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
> @@ -728,16 +738,12 @@ void kvm_set_cpu_caps(void)
>  
>  	kvm_cpu_cap_init(CPUID_7_EDX,
>  		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
> -		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
> +		F(SPEC_CTRL_SSBD) | EMUL_F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
>  		F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
>  		F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) |
>  		F(AMX_TILE) | F(AMX_INT8) | F(AMX_BF16) | F(FLUSH_L1D)
>  	);
>  
> -	/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
> -	kvm_cpu_cap_set(X86_FEATURE_TSC_ADJUST);
> -	kvm_cpu_cap_set(X86_FEATURE_ARCH_CAPABILITIES);
> -
>  	if (boot_cpu_has(X86_FEATURE_IBPB) && boot_cpu_has(X86_FEATURE_IBRS))
>  		kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL);
>  	if (boot_cpu_has(X86_FEATURE_STIBP))





^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 33/49] KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID
  2024-05-17 17:39 ` [PATCH v2 33/49] KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID Sean Christopherson
  2024-05-22  9:11   ` Binbin Wu
@ 2024-07-05  2:04   ` Maxim Levitsky
  2024-07-09 19:28     ` Sean Christopherson
  1 sibling, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  2:04 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Advertise TSC_DEADLINE_TIMER via KVM_GET_SUPPORTED_CPUID when it's
> supported in hardware, as the odds of a VMM emulating the local APIC in
> userspace, not emulating the TSC deadline timer, _and_ reflecting
> KVM_GET_SUPPORTED_CPUID back into KVM_SET_CPUID2 are extremely low.
> 
> KVM has _unconditionally_ advertised X2APIC via CPUID since commit
> 0d1de2d901f4 ("KVM: Always report x2apic as supported feature"), and it
> is completely impossible for userspace to emulate X2APIC as KVM doesn't
> support forwarding the MSR accesses to userspace.  I.e. KVM has relied on
> userspace VMMs to not misreport local APIC capabilities for nearly 13
> years.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  Documentation/virt/kvm/api.rst | 9 ++++++---
>  arch/x86/kvm/cpuid.c           | 4 ++--
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 884846282d06..cb744a646de6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1804,15 +1804,18 @@ emulate them efficiently. The fields in each entry are defined as follows:
>           the values returned by the cpuid instruction for
>           this function/index combination
>  
> -The TSC deadline timer feature (CPUID leaf 1, ecx[24]) is always returned
> -as false, since the feature depends on KVM_CREATE_IRQCHIP for local APIC
> -support.  Instead it is reported via::
> +x2APIC (CPUID leaf 1, ecx[21) and TSC deadline timer (CPUID leaf 1, ecx[24])
> +may be returned as true, but they depend on KVM_CREATE_IRQCHIP for in-kernel
> +emulation of the local APIC.  TSC deadline timer support is also reported via::
>  
>    ioctl(KVM_CHECK_EXTENSION, KVM_CAP_TSC_DEADLINE_TIMER)
>  
>  if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
>  feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
>  
> +Enabling x2APIC in KVM_SET_CPUID2 requires KVM_CREATE_IRQCHIP as KVM doesn't
> +support forwarding x2APIC MSR accesses to userspace, i.e. KVM does not support
> +emulating x2APIC in userspace.
>  
>  4.47 KVM_PPC_GET_PVINFO
>  -----------------------
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 699ce4261e9c..d1f427284ccc 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -680,8 +680,8 @@ void kvm_set_cpu_caps(void)
>  		F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
>  		F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
>  		F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
> -		0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
> -		F(F16C) | F(RDRAND)
> +		EMUL_F(TSC_DEADLINE_TIMER) | F(AES) | F(XSAVE) |
> +		0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND)
>  	);
>  
>  	kvm_cpu_cap_init(CPUID_1_EDX,

Hi,

I have a mixed feeling about this.

First of all KVM_GET_SUPPORTED_CPUID documentation explicitly states that it returns bits
that are supported in *default* configuration
TSC_DEADLINE_TIMER and arguably X2APIC are only supported after enabling various caps,
e.g not default configuration.

However, since X2APIC also in KVM_GET_SUPPORTED_CPUID (also wrongly IMHO), for consistency it does make
sense to add TSC_DEADLINE_TIMER as well.


I do think that we need at least to update the documentation of KVM_GET_SUPPORTED_CPUID
and KVM_GET_EMULATED_CPUID, as I state in a review of a later patch.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 34/49] KVM: x86: Advertise HYPERVISOR in KVM_GET_SUPPORTED_CPUID
  2024-05-17 17:39 ` [PATCH v2 34/49] KVM: x86: Advertise HYPERVISOR " Sean Christopherson
@ 2024-07-05  2:04   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  2:04 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Unconditionally advertise "support" for the HYPERVISOR feature in CPUID,
> as the flag simply communicates to the guest that's it's running under a
> hypervisor.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index d1f427284ccc..de898d571faa 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -681,7 +681,8 @@ void kvm_set_cpu_caps(void)
>  		F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
>  		F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
>  		EMUL_F(TSC_DEADLINE_TIMER) | F(AES) | F(XSAVE) |
> -		0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND)
> +		0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND) |
> +		EMUL_F(HYPERVISOR)
>  	);
>  
>  	kvm_cpu_cap_init(CPUID_1_EDX,

This makes sense.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 35/49] KVM: x86: Add a macro to handle features that are fully VMM controlled
  2024-05-17 17:39 ` [PATCH v2 35/49] KVM: x86: Add a macro to handle features that are fully VMM controlled Sean Christopherson
@ 2024-07-05  2:08   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  2:08 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Add a macro to track CPUID features for which KVM fully defers to
> userspace, i.e. that KVM honors if they are enumerated to the guest, even
> if KVM itself doesn't advertise them to usersepace.
> 
> Somewhat unfortunately, this behavior only applies to MWAIT (largely
> because of KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS), and it's not all that
> likely future features will be handled in a similar way.  I.e. very
> arguably, potentially tracking every feature in kvm_vmm_cpu_caps is a
> waste of memory.
> 
> However, adding one-off handling for individual features is quite painful,
> especially when considering future hardening.  It's very doable to verify,
> at compile time, that every CPUID-based feature that KVM queries when
> emulating guest behavior is actually known to KVM, e.g. to prevent KVM
> bugs where KVM emulates some feature but fails to advertise support to
> userspace.  In other words, any features that are special cased, i.e. not
> handled generically in the CPUID framework, would also need to be special
> cased for any hardening efforts that build on said framework.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index de898d571faa..16bb873188d6 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -36,6 +36,8 @@
>  u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
>  EXPORT_SYMBOL_GPL(kvm_cpu_caps);
>  
> +static u32 kvm_vmm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
> +
>  u32 xstate_required_size(u64 xstate_bv, bool compacted)
>  {
>  	int feature_bit = 0;
> @@ -115,6 +117,21 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
>  	feature_bit(name);							\
>  })
>  
> +/*
> + * VMM Features - For features that KVM "supports" in some capacity, i.e. that
> + * KVM may query, but that are never advertised to userspace.  E.g. KVM allows
> + * userspace to enumerate MONITOR+MWAIT support to the guest, but the MWAIT
> + * feature flag is never advertised to userspace because MONITOR+MWAIT aren't
> + * virtualized by hardware, can't be faithfully emulated in software (KVM
> + * emulates them as NOPs), and allowing the guest to execute them natively
> + * requires enabling a per-VM capability.
> + */
> +#define VMM_F(name)								\
> +({										\
> +	kvm_vmm_cpu_caps[__feature_leaf(X86_FEATURE_##name)] |= F(name);	\
> +	0;									\
> +})
> +
>  /*
>   * Magic value used by KVM when querying userspace-provided CPUID entries and
>   * doesn't care about the CPIUD index because the index of the function in
> @@ -674,7 +691,7 @@ void kvm_set_cpu_caps(void)
>  		 * NOTE: MONITOR (and MWAIT) are emulated as NOP, but *not*
>  		 * advertised to guests via CPUID!
>  		 */
> -		F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
> +		F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64 */ | VMM_F(MWAIT) |
>  		0 /* DS-CPL, VMX, SMX, EST */ |
>  		0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
>  		F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |

Hi,

Not sure that this is worth it. Especially, IMHO the definition of
'KVM honors if they are enumerated to the guest, even if KVM itself doesn't advertise them to usersepace',
is very problematic - AFAIK KVM allows/honours userspace to set anything in the guest visible CPUID, I myself
caused a guest crash once on purpose by forcing it to see AVX3 which is not supported on my CPU.

I think that you mean features that KVM also uses for itself, e.g disables certain VMEXITS, etc,
but this is very hard to understand.

IMHO it is better to handle this case by case basis, it is less confusing.

So far, MWAIT is the only such feature, what do you think is the probability
of Intel/AMD adding more such features?

Best regards,	
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 37/49] KVM: x86: Replace guts of "governed" features with comprehensive cpu_caps
  2024-05-17 17:39 ` [PATCH v2 37/49] KVM: x86: Replace guts of "governed" features with comprehensive cpu_caps Sean Christopherson
  2024-06-20  2:20   ` Yang, Weijiang
@ 2024-07-05  2:10   ` Maxim Levitsky
  2024-07-09 18:30     ` Sean Christopherson
  1 sibling, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  2:10 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Replace the internals of the governed features framework with a more
> comprehensive "guest CPU capabilities" implementation, i.e. with a guest
> version of kvm_cpu_caps.  Keep the skeleton of governed features around
> for now as vmx_adjust_sec_exec_control() relies on detecting governed
> features to do the right thing for XSAVES, and switching all guest feature
> queries to guest_cpu_cap_has() requires subtle and non-trivial changes,
> i.e. is best done as a standalone change.
> 
> Tracking *all* guest capabilities that KVM cares will allow excising the
> poorly named "governed features" framework, and effectively optimizes all
> KVM queries of guest capabilities, i.e. doesn't require making a
> subjective decision as to whether or not a feature is worth "governing",
> and doesn't require adding the code to do so.
> 
> The cost of tracking all features is currently 92 bytes per vCPU on 64-bit
> kernels: 100 bytes for cpu_caps versus 8 bytes for governed_features.
> That cost is well worth paying even if the only benefit was eliminating
> the "governed features" terminology.  And practically speaking, the real
> cost is zero unless those 92 bytes pushes the size of vcpu_vmx or vcpu_svm
> into a new order-N allocation, and if that happens there are better ways
> to reduce the footprint of kvm_vcpu_arch, e.g. making the PMU and/or MTRR
> state separate allocations.
> 
> Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 45 +++++++++++++++++++++------------
>  arch/x86/kvm/cpuid.c            | 14 +++++++---
>  arch/x86/kvm/cpuid.h            | 12 ++++-----
>  arch/x86/kvm/reverse_cpuid.h    | 16 ------------
>  4 files changed, 46 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3003e99155e7..8840d21ee0b5 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -743,6 +743,22 @@ struct kvm_queued_exception {
>  	bool has_payload;
>  };
>  
> +/*
> + * Hardware-defined CPUID leafs that are either scattered by the kernel or are
> + * unknown to the kernel, but need to be directly used by KVM.  Note, these
> + * word values conflict with the kernel's "bug" caps, but KVM doesn't use those.
> + */
> +enum kvm_only_cpuid_leafs {
> +	CPUID_12_EAX	 = NCAPINTS,
> +	CPUID_7_1_EDX,
> +	CPUID_8000_0007_EDX,
> +	CPUID_8000_0022_EAX,
> +	CPUID_7_2_EDX,
> +	NR_KVM_CPU_CAPS,
> +
> +	NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
> +};
> +
>  struct kvm_vcpu_arch {
>  	/*
>  	 * rip and regs accesses must go through
> @@ -861,23 +877,20 @@ struct kvm_vcpu_arch {
>  	bool is_amd_compatible;
>  
>  	/*
> -	 * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly
> -	 * when "struct kvm_vcpu_arch" is no longer defined in an
> -	 * arch/x86/include/asm header.  The max is mostly arbitrary, i.e.
> -	 * can be increased as necessary.
> +	 * cpu_caps holds the effective guest capabilities, i.e. the features
> +	 * the vCPU is allowed to use.  Typically, but not always, features can
> +	 * be used by the guest if and only if both KVM and userspace want to
> +	 * expose the feature to the guest.

Nitpick: Since even the comment mentions this, wouldn't it be better to call this
cpu_effective_caps? or at least cpu_eff_caps, to emphasize that these are indeed
effective capabilities, e.g these that both kvm and userspace support?


> +	 *
> +	 * A common exception is for virtualization holes, i.e. when KVM can't
> +	 * prevent the guest from using a feature, in which case the vCPU "has"
> +	 * the feature regardless of what KVM or userspace desires.
> +	 *
> +	 * Note, features that don't require KVM involvement in any way are
> +	 * NOT enforced/sanitized by KVM, i.e. are taken verbatim from the
> +	 * guest CPUID provided by userspace.
>  	 */
> -#define KVM_MAX_NR_GOVERNED_FEATURES BITS_PER_LONG
> -
> -	/*
> -	 * Track whether or not the guest is allowed to use features that are
> -	 * governed by KVM, where "governed" means KVM needs to manage state
> -	 * and/or explicitly enable the feature in hardware.  Typically, but
> -	 * not always, governed features can be used by the guest if and only
> -	 * if both KVM and userspace want to expose the feature to the guest.
> -	 */
> -	struct {
> -		DECLARE_BITMAP(enabled, KVM_MAX_NR_GOVERNED_FEATURES);
> -	} governed_features;
> +	u32 cpu_caps[NR_KVM_CPU_CAPS];
>  
>  	u64 reserved_gpa_bits;
>  	int maxphyaddr;
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 286abefc93d5..89c506cf649b 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -387,9 +387,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	struct kvm_cpuid_entry2 *best;
>  	bool allow_gbpages;
>  
> -	BUILD_BUG_ON(KVM_NR_GOVERNED_FEATURES > KVM_MAX_NR_GOVERNED_FEATURES);
> -	bitmap_zero(vcpu->arch.governed_features.enabled,
> -		    KVM_MAX_NR_GOVERNED_FEATURES);
> +	memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps));
>  
>  	kvm_update_cpuid_runtime(vcpu);
>  
> @@ -473,6 +471,7 @@ u64 kvm_vcpu_reserved_gpa_bits_raw(struct kvm_vcpu *vcpu)
>  static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
>                          int nent)
>  {
> +	u32 vcpu_caps[NR_KVM_CPU_CAPS];
>  	int r;
>  
>  	/*
> @@ -480,10 +479,18 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
>  	 * order to massage the new entries, e.g. to account for dynamic bits
>  	 * that KVM controls, without clobbering the current guest CPUID, which
>  	 * KVM needs to preserve in order to unwind on failure.
> +	 *
> +	 * Similarly, save the vCPU's current cpu_caps so that the capabilities
> +	 * can be updated alongside the CPUID entries when performing runtime
> +	 * updates.  Full initialization is done if and only if the vCPU hasn't
> +	 * run, i.e. only if userspace is potentially changing CPUID features.
>  	 */
>  	swap(vcpu->arch.cpuid_entries, e2);
>  	swap(vcpu->arch.cpuid_nent, nent);
>  
> +	memcpy(vcpu_caps, vcpu->arch.cpu_caps, sizeof(vcpu_caps));
> +	BUILD_BUG_ON(sizeof(vcpu_caps) != sizeof(vcpu->arch.cpu_caps));
> +
>  	/*
>  	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
>  	 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
> @@ -527,6 +534,7 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
>  	return 0;
>  
>  err:
> +	memcpy(vcpu->arch.cpu_caps, vcpu_caps, sizeof(vcpu_caps));
>  	swap(vcpu->arch.cpuid_entries, e2);
>  	swap(vcpu->arch.cpuid_nent, nent);
>  	return r;
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index e021681f34ac..ad0168d3aec5 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -259,10 +259,10 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature)
>  static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
>  					      unsigned int x86_feature)
>  {
> -	BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
> +	unsigned int x86_leaf = __feature_leaf(x86_feature);
>  
> -	__set_bit(kvm_governed_feature_index(x86_feature),
> -		  vcpu->arch.governed_features.enabled);
> +	reverse_cpuid_check(x86_leaf);
Indeed, no need for reverse_cpuid_check here, as already mentioned.

> +	vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
>  }
>  
>  static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
> @@ -275,10 +275,10 @@ static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
>  static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
>  					      unsigned int x86_feature)
>  {
> -	BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
> +	unsigned int x86_leaf = __feature_leaf(x86_feature);
>  
> -	return test_bit(kvm_governed_feature_index(x86_feature),
> -			vcpu->arch.governed_features.enabled);
> +	reverse_cpuid_check(x86_leaf);
> +	return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature);
>  }
>  
>  static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
> diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
> index 245f71c16272..63d5735fbc8a 100644
> --- a/arch/x86/kvm/reverse_cpuid.h
> +++ b/arch/x86/kvm/reverse_cpuid.h
> @@ -6,22 +6,6 @@
>  #include <asm/cpufeature.h>
>  #include <asm/cpufeatures.h>
>  
> -/*
> - * Hardware-defined CPUID leafs that are either scattered by the kernel or are
> - * unknown to the kernel, but need to be directly used by KVM.  Note, these
> - * word values conflict with the kernel's "bug" caps, but KVM doesn't use those.
> - */
> -enum kvm_only_cpuid_leafs {
> -	CPUID_12_EAX	 = NCAPINTS,
> -	CPUID_7_1_EDX,
> -	CPUID_8000_0007_EDX,
> -	CPUID_8000_0022_EAX,
> -	CPUID_7_2_EDX,
> -	NR_KVM_CPU_CAPS,
> -
> -	NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
> -};
> -
>  /*
>   * Define a KVM-only feature flag.
>   *


Overall:

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 38/49] KVM: x86: Initialize guest cpu_caps based on guest CPUID
  2024-05-17 17:39 ` [PATCH v2 38/49] KVM: x86: Initialize guest cpu_caps based on guest CPUID Sean Christopherson
  2024-06-20  2:24   ` Yang, Weijiang
@ 2024-07-05  2:13   ` Maxim Levitsky
  1 sibling, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  2:13 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Initialize a vCPU's capabilities based on the guest CPUID provided by
> userspace instead of simply zeroing the entire array.  This is the first
> step toward using cpu_caps to query *all* CPUID-based guest capabilities,
> i.e. will allow converting all usage of guest_cpuid_has() to
> guest_cpu_cap_has().
> 
> Zeroing the array was the logical choice when using cpu_caps was opt-in,
> e.g. "unsupported" was generally a safer default, and the whole point of
> governed features is that KVM would need to check host and guest support,
> i.e. making everything unsupported by default didn't require more code.
> 
> But requiring KVM to manually "enable" every CPUID-based feature in
> cpu_caps would require an absurd amount of boilerplate code.
> 
> Follow existing CPUID/kvm_cpu_caps nomenclature where possible, e.g. for
> the change() and clear() APIs.  Replace check_and_set() with constrain()
> to try and capture that KVM is constraining userspace's desired guest
> feature set based on KVM's capabilities.
> 
> This is intended to be gigantic nop, i.e. should not have any impact on
> guest or KVM functionality.
> 
> This is also an intermediate step; a future commit will also incorporate
> KVM support into the vCPU's cpu_caps before converting guest_cpuid_has()
> to guest_cpu_cap_has().
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c   | 46 ++++++++++++++++++++++++++++++++++++++++--
>  arch/x86/kvm/cpuid.h   | 25 ++++++++++++++++++++---
>  arch/x86/kvm/svm/svm.c | 28 +++++++++++++------------
>  arch/x86/kvm/vmx/vmx.c |  8 +++++---
>  4 files changed, 86 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 89c506cf649b..fd725cbbcce5 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -381,13 +381,56 @@ static bool kvm_cpuid_has_hyperv(struct kvm_vcpu *vcpu)
>  #endif
>  }
>  
> +/*
> + * This isn't truly "unsafe", but except for the cpu_caps initialization code,
> + * all register lookups should use __cpuid_entry_get_reg(), which provides
> + * compile-time validation of the input.
> + */
> +static u32 cpuid_get_reg_unsafe(struct kvm_cpuid_entry2 *entry, u32 reg)
> +{
> +	switch (reg) {
> +	case CPUID_EAX:
> +		return entry->eax;
> +	case CPUID_EBX:
> +		return entry->ebx;
> +	case CPUID_ECX:
> +		return entry->ecx;
> +	case CPUID_EDX:
> +		return entry->edx;
> +	default:
> +		WARN_ON_ONCE(1);
> +		return 0;
> +	}
> +}
> +
>  void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  	struct kvm_cpuid_entry2 *best;
> +	struct kvm_cpuid_entry2 *entry;
>  	bool allow_gbpages;
> +	int i;
>  
>  	memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps));
> +	BUILD_BUG_ON(ARRAY_SIZE(reverse_cpuid) != NR_KVM_CPU_CAPS);
> +
> +	/*
> +	 * Reset guest capabilities to userspace's guest CPUID definition, i.e.
> +	 * honor userspace's definition for features that don't require KVM or
> +	 * hardware management/support (or that KVM simply doesn't care about).
> +	 */
> +	for (i = 0; i < NR_KVM_CPU_CAPS; i++) {
> +		const struct cpuid_reg cpuid = reverse_cpuid[i];
> +
> +		if (!cpuid.function)
> +			continue;
> +
> +		entry = kvm_find_cpuid_entry_index(vcpu, cpuid.function, cpuid.index);
> +		if (!entry)
> +			continue;
> +
> +		vcpu->arch.cpu_caps[i] = cpuid_get_reg_unsafe(entry, cpuid.reg);
> +	}
>  
>  	kvm_update_cpuid_runtime(vcpu);
>  
> @@ -404,8 +447,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	 */
>  	allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) :
>  				      guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES);
> -	if (allow_gbpages)
> -		guest_cpu_cap_set(vcpu, X86_FEATURE_GBPAGES);
> +	guest_cpu_cap_change(vcpu, X86_FEATURE_GBPAGES, allow_gbpages);
>  
>  	best = kvm_find_cpuid_entry(vcpu, 1);
>  	if (best && apic) {
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index ad0168d3aec5..c2c2b8aa347b 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -265,11 +265,30 @@ static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
>  	vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
>  }
>  
> -static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
> -							unsigned int x86_feature)
> +static __always_inline void guest_cpu_cap_clear(struct kvm_vcpu *vcpu,
> +						unsigned int x86_feature)
>  {
> -	if (kvm_cpu_cap_has(x86_feature) && guest_cpuid_has(vcpu, x86_feature))
> +	unsigned int x86_leaf = __feature_leaf(x86_feature);
> +
> +	reverse_cpuid_check(x86_leaf);
> +	vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature);
> +}
> +
> +static __always_inline void guest_cpu_cap_change(struct kvm_vcpu *vcpu,
> +						 unsigned int x86_feature,
> +						 bool guest_has_cap)
> +{
> +	if (guest_has_cap)
>  		guest_cpu_cap_set(vcpu, x86_feature);
> +	else
> +		guest_cpu_cap_clear(vcpu, x86_feature);
> +}

Assuming that this code is not deleted in following patches, I''ll prefer
to call this 'guest_cpu_cap_change' because this is what the function does.

> +
> +static __always_inline void guest_cpu_cap_constrain(struct kvm_vcpu *vcpu,
> +						    unsigned int x86_feature)
> +{
> +	if (!kvm_cpu_cap_has(x86_feature))
> +		guest_cpu_cap_clear(vcpu, x86_feature);
>  }
>  
>  static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 2acd2e3bb1b0..1bc431a7e862 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4339,27 +4339,29 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	 * XSS on VM-Enter/VM-Exit.  Failure to do so would effectively give
>  	 * the guest read/write access to the host's XSS.
>  	 */
> -	if (boot_cpu_has(X86_FEATURE_XSAVE) &&
> -	    boot_cpu_has(X86_FEATURE_XSAVES) &&
> -	    guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
> -		guest_cpu_cap_set(vcpu, X86_FEATURE_XSAVES);
> +	guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES,
> +			     boot_cpu_has(X86_FEATURE_XSAVE) &&
> +			     boot_cpu_has(X86_FEATURE_XSAVES) &&
> +			     guest_cpuid_has(vcpu, X86_FEATURE_XSAVE));
>  
> -	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_NRIPS);
> -	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR);
> -	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LBRV);
> +	guest_cpu_cap_constrain(vcpu, X86_FEATURE_NRIPS);
> +	guest_cpu_cap_constrain(vcpu, X86_FEATURE_TSCRATEMSR);
> +	guest_cpu_cap_constrain(vcpu, X86_FEATURE_LBRV);
>  
>  	/*
>  	 * Intercept VMLOAD if the vCPU mode is Intel in order to emulate that
>  	 * VMLOAD drops bits 63:32 of SYSENTER (ignoring the fact that exposing
>  	 * SVM on Intel is bonkers and extremely unlikely to work).
>  	 */
> -	if (!guest_cpuid_is_intel(vcpu))
> -		guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
> +	if (guest_cpuid_is_intel(vcpu))
> +		guest_cpu_cap_clear(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
> +	else
> +		guest_cpu_cap_constrain(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
>  
> -	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PAUSEFILTER);
> -	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD);
> -	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VGIF);
> -	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VNMI);
> +	guest_cpu_cap_constrain(vcpu, X86_FEATURE_PAUSEFILTER);
> +	guest_cpu_cap_constrain(vcpu, X86_FEATURE_PFTHRESHOLD);
> +	guest_cpu_cap_constrain(vcpu, X86_FEATURE_VGIF);
> +	guest_cpu_cap_constrain(vcpu, X86_FEATURE_VNMI);
>  
>  	svm_recalc_instruction_intercepts(vcpu, svm);
>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 1bc56596d653..d873386e1473 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7838,10 +7838,12 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	 */
>  	if (boot_cpu_has(X86_FEATURE_XSAVE) &&
>  	    guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
> -		guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_XSAVES);
> +		guest_cpu_cap_constrain(vcpu, X86_FEATURE_XSAVES);
> +	else
> +		guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES);
>  
> -	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VMX);
> -	guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LAM);
> +	guest_cpu_cap_constrain(vcpu, X86_FEATURE_VMX);
> +	guest_cpu_cap_constrain(vcpu, X86_FEATURE_LAM);
>  
>  	vmx_setup_uret_msrs(vmx);
>  


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 39/49] KVM: x86: Extract code for generating per-entry emulated CPUID information
  2024-05-17 17:39 ` [PATCH v2 39/49] KVM: x86: Extract code for generating per-entry emulated CPUID information Sean Christopherson
@ 2024-07-05  2:18   ` Maxim Levitsky
  2024-07-09  0:13     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  2:18 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Extract the meat of __do_cpuid_func_emulated() into a separate helper,
> cpuid_func_emulated(), so that cpuid_func_emulated() can be used with a
> single CPUID entry.  This will allow marking emulated features as fully
> supported in the guest cpu_caps without needing to hardcode the set of
> emulated features in multiple locations.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 26 +++++++++++++-------------
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index fd725cbbcce5..d1849fe874ab 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -1007,14 +1007,10 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
>  	return entry;
>  }
>  
> -static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
> +static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func)
>  {
> -	struct kvm_cpuid_entry2 *entry;
> +	memset(entry, 0, sizeof(*entry));
>  
> -	if (array->nent >= array->maxnent)
> -		return -E2BIG;
> -
> -	entry = &array->entries[array->nent];
>  	entry->function = func;
>  	entry->index = 0;
>  	entry->flags = 0;
> @@ -1022,23 +1018,27 @@ static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
>  	switch (func) {
>  	case 0:
>  		entry->eax = 7;
> -		++array->nent;
> -		break;
> +		return 1;
>  	case 1:
>  		entry->ecx = F(MOVBE);
> -		++array->nent;
> -		break;
> +		return 1;
>  	case 7:
>  		entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
>  		entry->eax = 0;
>  		if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP))
>  			entry->ecx = F(RDPID);
> -		++array->nent;
> -		break;
> +		return 1;
>  	default:
> -		break;
> +		return 0;
>  	}
> +}
>  
> +static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
> +{
> +	if (array->nent >= array->maxnent)
> +		return -E2BIG;
> +
> +	array->nent += cpuid_func_emulated(&array->entries[array->nent], func);
>  	return 0;
>  }
>  
Hi,

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>


PS: I spoke with Paolo about the meaning of KVM_GET_EMULATED_CPUID, because it is not clear
from the documentation what it does, or what it supposed to do because qemu doesn't use this
IOCTL.

So this ioctl is meant to return a static list of CPU features which *can* be emulated
by KVM, if the cpu doesn't support them, but there is a cost to it, so they
should not be enabled by default.

This means that if you run 'qemu -cpu host', these features (like rdpid) will only
be enabled if supported by the host cpu, however if you explicitly ask
qemu for such a feature, like 'qemu -cpu host,+rdpid', 
qemu should not warn if the feature is not supported on host cpu but can be emulated
(because kvm can emulate the feature, which is stated by KVM_GET_EMULATED_CPUID ioctl).

Qemu currently doesn't support this but the support can be added.

So I think that the two ioctls should be redefined as such:

KVM_GET_SUPPORTED_CPUID - returns all CPU features that are supported
by KVM, supported by host hardware, or that KVM can efficiently emulate.


KVM_GET_EMULATED_CPUID - returns all CPU features that KVM *can* emulate
if the host cpu lacks support, but emulation is not efficient and thus
these features should be used with care when not supported by the host 
(e.g only when the user explicitly asks for them).


I can post a patch to fix this or you can add something like that to your
patch series if you prefer.


Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 40/49] KVM: x86: Initialize guest cpu_caps based on KVM support
  2024-05-17 17:39 ` [PATCH v2 40/49] KVM: x86: Initialize guest cpu_caps based on KVM support Sean Christopherson
@ 2024-07-05  2:22   ` Maxim Levitsky
  2024-07-09  0:10     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  2:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Constrain all guest cpu_caps based on KVM support instead of constraining
> only the few features that KVM _currently_ needs to verify are actually
> supported by KVM.  The intent of cpu_caps is to track what the guest is
> actually capable of using, not the raw, unfiltered CPUID values that the
> guest sees.
> 
> I.e. KVM should always consult it's only support when making decisions
> based on guest CPUID, and the only reason KVM has historically made the
> checks opt-in was due to lack of centralized tracking.
> 
> Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c   | 14 +++++++++++++-
>  arch/x86/kvm/cpuid.h   |  7 -------
>  arch/x86/kvm/svm/svm.c | 11 -----------
>  arch/x86/kvm/vmx/vmx.c |  9 ++-------
>  4 files changed, 15 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index d1849fe874ab..8ada1cac8fcb 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -403,6 +403,8 @@ static u32 cpuid_get_reg_unsafe(struct kvm_cpuid_entry2 *entry, u32 reg)
>  	}
>  }
>  
> +static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func);
> +
>  void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
> @@ -421,6 +423,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	 */
>  	for (i = 0; i < NR_KVM_CPU_CAPS; i++) {
>  		const struct cpuid_reg cpuid = reverse_cpuid[i];
> +		struct kvm_cpuid_entry2 emulated;
>  
>  		if (!cpuid.function)
>  			continue;
> @@ -429,7 +432,16 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  		if (!entry)
>  			continue;
>  
> -		vcpu->arch.cpu_caps[i] = cpuid_get_reg_unsafe(entry, cpuid.reg);
> +		cpuid_func_emulated(&emulated, cpuid.function);
> +
> +		/*
> +		 * A vCPU has a feature if it's supported by KVM and is enabled
> +		 * in guest CPUID.  Note, this includes features that are
> +		 * supported by KVM but aren't advertised to userspace!
> +		 */
> +		vcpu->arch.cpu_caps[i] = kvm_cpu_caps[i] | kvm_vmm_cpu_caps[i] |
> +					 cpuid_get_reg_unsafe(&emulated, cpuid.reg);
> +		vcpu->arch.cpu_caps[i] &= cpuid_get_reg_unsafe(entry, cpuid.reg);

Hi,

I have an idea. What if we get rid of kvm_vmm_cpu_caps, and instead advertise the
MWAIT in KVM_GET_EMULATED_CPUID?

MWAIT is sort of emulated as NOP after all, plus features in KVM_GET_EMULATED_CPUID are
sort of 'emulated inefficiently' and you can say that NOP is an inefficient emulation
of MWAIT sort of.

It just feels to me that kvm_vmm_cpu_caps, is somewhat an overkill, and its name is
somewhat confusing.

Other than that this code looks good.


>  	}
>  
>  	kvm_update_cpuid_runtime(vcpu);
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index c2c2b8aa347b..60da304db4e4 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -284,13 +284,6 @@ static __always_inline void guest_cpu_cap_change(struct kvm_vcpu *vcpu,
>  		guest_cpu_cap_clear(vcpu, x86_feature);
>  }
>  
> -static __always_inline void guest_cpu_cap_constrain(struct kvm_vcpu *vcpu,
> -						    unsigned int x86_feature)
> -{
> -	if (!kvm_cpu_cap_has(x86_feature))
> -		guest_cpu_cap_clear(vcpu, x86_feature);
> -}
> -
>  static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
>  					      unsigned int x86_feature)
>  {
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 1bc431a7e862..946a75771946 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4344,10 +4344,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  			     boot_cpu_has(X86_FEATURE_XSAVES) &&
>  			     guest_cpuid_has(vcpu, X86_FEATURE_XSAVE));
>  
> -	guest_cpu_cap_constrain(vcpu, X86_FEATURE_NRIPS);
> -	guest_cpu_cap_constrain(vcpu, X86_FEATURE_TSCRATEMSR);
> -	guest_cpu_cap_constrain(vcpu, X86_FEATURE_LBRV);
> -
>  	/*
>  	 * Intercept VMLOAD if the vCPU mode is Intel in order to emulate that
>  	 * VMLOAD drops bits 63:32 of SYSENTER (ignoring the fact that exposing
> @@ -4355,13 +4351,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	 */
>  	if (guest_cpuid_is_intel(vcpu))
>  		guest_cpu_cap_clear(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
> -	else
> -		guest_cpu_cap_constrain(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD);
> -
> -	guest_cpu_cap_constrain(vcpu, X86_FEATURE_PAUSEFILTER);
> -	guest_cpu_cap_constrain(vcpu, X86_FEATURE_PFTHRESHOLD);
> -	guest_cpu_cap_constrain(vcpu, X86_FEATURE_VGIF);
> -	guest_cpu_cap_constrain(vcpu, X86_FEATURE_VNMI);

Finally, this code is gone.

>  
>  	svm_recalc_instruction_intercepts(vcpu, svm);
>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index d873386e1473..653c4b68ec7f 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7836,15 +7836,10 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	 * to the guest.  XSAVES depends on CR4.OSXSAVE, and CR4.OSXSAVE can be
>  	 * set if and only if XSAVE is supported.
>  	 */
> -	if (boot_cpu_has(X86_FEATURE_XSAVE) &&
> -	    guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
> -		guest_cpu_cap_constrain(vcpu, X86_FEATURE_XSAVES);
> -	else
> +	if (!boot_cpu_has(X86_FEATURE_XSAVE) ||
> +	    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
>  		guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES);
>  
> -	guest_cpu_cap_constrain(vcpu, X86_FEATURE_VMX);
> -	guest_cpu_cap_constrain(vcpu, X86_FEATURE_LAM);

Good riddance!
> -
>  	vmx_setup_uret_msrs(vmx);
>  
>  	if (cpu_has_secondary_exec_ctrls())


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 41/49] KVM: x86: Avoid double CPUID lookup when updating MWAIT at runtime
  2024-05-17 17:39 ` [PATCH v2 41/49] KVM: x86: Avoid double CPUID lookup when updating MWAIT at runtime Sean Christopherson
@ 2024-07-05  2:22   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  2:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Move the handling of X86_FEATURE_MWAIT during CPUID runtime updates to
> utilize the lookup done for other CPUID.0x1 features.
> 
> No functional change intended.
> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 13 +++++--------
>  1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 8ada1cac8fcb..258c5fce87fc 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -343,6 +343,11 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
>  
>  		cpuid_entry_change(best, X86_FEATURE_APIC,
>  			   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
> +
> +		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT))
> +			cpuid_entry_change(best, X86_FEATURE_MWAIT,
> +					   vcpu->arch.ia32_misc_enable_msr &
> +					   MSR_IA32_MISC_ENABLE_MWAIT);
>  	}
>  
>  	best = kvm_find_cpuid_entry_index(vcpu, 7, 0);
> @@ -358,14 +363,6 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
>  	if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) ||
>  		     cpuid_entry_has(best, X86_FEATURE_XSAVEC)))
>  		best->ebx = xstate_required_size(vcpu->arch.xcr0, true);
> -
> -	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)) {
> -		best = kvm_find_cpuid_entry(vcpu, 0x1);
> -		if (best)
> -			cpuid_entry_change(best, X86_FEATURE_MWAIT,
> -					   vcpu->arch.ia32_misc_enable_msr &
> -					   MSR_IA32_MISC_ENABLE_MWAIT);
> -	}
>  }
>  EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);
>  

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 42/49] KVM: x86: Drop unnecessary check that cpuid_entry2_find() returns right leaf
  2024-05-17 17:39 ` [PATCH v2 42/49] KVM: x86: Drop unnecessary check that cpuid_entry2_find() returns right leaf Sean Christopherson
@ 2024-07-05  2:22   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  2:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Drop an unnecessary check that kvm_find_cpuid_entry_index(), i.e.
> cpuid_entry2_find(), returns the correct leaf when getting CPUID.0x7.0x0
> to update X86_FEATURE_OSPKE.  cpuid_entry2_find() never returns an entry
> for the wrong function.  And not that it matters, but cpuid_entry2_find()
> will always return a precise match for CPUID.0x7.0x0 since the index is
> significant.
> 
> No functional change intended.
> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 258c5fce87fc..8256fc657c6b 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -351,7 +351,7 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
>  	}
>  
>  	best = kvm_find_cpuid_entry_index(vcpu, 7, 0);
> -	if (best && boot_cpu_has(X86_FEATURE_PKU) && best->function == 0x7)
> +	if (best && boot_cpu_has(X86_FEATURE_PKU))
>  		cpuid_entry_change(best, X86_FEATURE_OSPKE,
>  				   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
>  

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 43/49] KVM: x86: Update OS{XSAVE,PKE} bits in guest CPUID irrespective of host support
  2024-05-17 17:39 ` [PATCH v2 43/49] KVM: x86: Update OS{XSAVE,PKE} bits in guest CPUID irrespective of host support Sean Christopherson
@ 2024-07-05  2:22   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  2:22 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> When making runtime CPUID updates, change OSXSAVE and OSPKE even if their
> respective base features (XSAVE, PKU) are not supported by the host.  KVM
> already incorporates host support in the vCPU's effective reserved CR4 bits.
> I.e. OSXSAVE and OSPKE can be set if and only if the host supports them.
> 
> And conversely, since KVM's ABI is that KVM owns the dynamic OS feature
> flags, clearing them when they obviously aren't supported and thus can't
> be enabled is arguably a fix.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 8256fc657c6b..552e65ba5efa 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -336,10 +336,8 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
>  
>  	best = kvm_find_cpuid_entry(vcpu, 1);
>  	if (best) {
> -		/* Update OSXSAVE bit */
> -		if (boot_cpu_has(X86_FEATURE_XSAVE))
> -			cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
> -					   kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE));
> +		cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
> +				   kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE));
>  
>  		cpuid_entry_change(best, X86_FEATURE_APIC,
>  			   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
> @@ -351,7 +349,7 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
>  	}
>  
>  	best = kvm_find_cpuid_entry_index(vcpu, 7, 0);
> -	if (best && boot_cpu_has(X86_FEATURE_PKU))
> +	if (best)
>  		cpuid_entry_change(best, X86_FEATURE_OSPKE,
>  				   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
>  

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 44/49] KVM: x86: Update guest cpu_caps at runtime for dynamic CPUID-based features
  2024-05-17 17:39 ` [PATCH v2 44/49] KVM: x86: Update guest cpu_caps at runtime for dynamic CPUID-based features Sean Christopherson
@ 2024-07-05  2:26   ` Maxim Levitsky
  2024-07-09  0:24     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  2:26 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> When updating guest CPUID entries to emulate runtime behavior, e.g. when
> the guest enables a CR4-based feature that is tied to a CPUID flag, also
> update the vCPU's cpu_caps accordingly.  This will allow replacing all
> usage of guest_cpuid_has() with guest_cpu_cap_has().
> 
> Note, this relies on kvm_set_cpuid() taking a snapshot of cpu_caps before
> invoking kvm_update_cpuid_runtime(), i.e. when KVM is updating CPUID
> entries that *may* become the vCPU's CPUID, so that unwinding to the old
> cpu_caps is possible if userspace tries to set bogus CPUID information.
> 
> Note #2, none of the features in question use guest_cpu_cap_has() at this
> time, i.e. aside from settings bits in cpu_caps, this is a glorified nop.
> 
> Cc: Yang Weijiang <weijiang.yang@intel.com>
> Cc: Robert Hoo <robert.hoo.linux@gmail.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 28 +++++++++++++++++++---------
>  1 file changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 552e65ba5efa..1424a9d4eb17 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -330,28 +330,38 @@ static u64 cpuid_get_supported_xcr0(struct kvm_vcpu *vcpu)
>  	return (best->eax | ((u64)best->edx << 32)) & kvm_caps.supported_xcr0;
>  }
>  
> +static __always_inline void kvm_update_feature_runtime(struct kvm_vcpu *vcpu,
> +						       struct kvm_cpuid_entry2 *entry,
> +						       unsigned int x86_feature,
> +						       bool has_feature)
> +{
> +	cpuid_entry_change(entry, x86_feature, has_feature);
> +	guest_cpu_cap_change(vcpu, x86_feature, has_feature);
> +}
> +
>  void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpuid_entry2 *best;
>  
>  	best = kvm_find_cpuid_entry(vcpu, 1);
>  	if (best) {
> -		cpuid_entry_change(best, X86_FEATURE_OSXSAVE,
> -				   kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE));
> +		kvm_update_feature_runtime(vcpu, best, X86_FEATURE_OSXSAVE,
> +					   kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE));
>  
> -		cpuid_entry_change(best, X86_FEATURE_APIC,
> -			   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
> +		kvm_update_feature_runtime(vcpu, best, X86_FEATURE_APIC,
> +					   vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE);
>  
>  		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT))
> -			cpuid_entry_change(best, X86_FEATURE_MWAIT,
> -					   vcpu->arch.ia32_misc_enable_msr &
> -					   MSR_IA32_MISC_ENABLE_MWAIT);
> +			kvm_update_feature_runtime(vcpu, best, X86_FEATURE_MWAIT,
> +						   vcpu->arch.ia32_misc_enable_msr &
> +						   MSR_IA32_MISC_ENABLE_MWAIT);
>  	}
>  
>  	best = kvm_find_cpuid_entry_index(vcpu, 7, 0);
>  	if (best)
> -		cpuid_entry_change(best, X86_FEATURE_OSPKE,
> -				   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
> +		kvm_update_feature_runtime(vcpu, best, X86_FEATURE_OSPKE,
> +					   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
> +
>  
>  	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 0);
>  	if (best)


I am not 100% sure that we need to do this.

Runtime cpuid changes are a hack that Intel did back then,
due to various reasons, These changes don't really change the feature set
that CPU supports, but merly as you like to say 'massage' the output of
the CPUID instruction to make the unmodified OS happy usually.

Thus it feels to me that CPU caps should not include the dynamic features, and neither
KVM should use the value of these as a source for truth, but rather the underlying
source of the truth (e.g CR4).

But if you insist, I don't really have a very strong reason to object this.

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 45/49] KVM: x86: Shuffle code to prepare for dropping guest_cpuid_has()
  2024-05-17 17:39 ` [PATCH v2 45/49] KVM: x86: Shuffle code to prepare for dropping guest_cpuid_has() Sean Christopherson
@ 2024-07-05  2:26   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  2:26 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Move the implementations of guest_has_{spec_ctrl,pred_cmd}_msr() down
> below guest_cpu_cap_has() so that their use of guest_cpuid_has() can be
> replaced with calls to guest_cpu_cap_has().
> 
> No functional change intended.
> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.h | 30 +++++++++++++++---------------
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index 60da304db4e4..7be56fa62342 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -168,21 +168,6 @@ static inline int guest_cpuid_stepping(struct kvm_vcpu *vcpu)
>  	return x86_stepping(best->eax);
>  }
>  
> -static inline bool guest_has_spec_ctrl_msr(struct kvm_vcpu *vcpu)
> -{
> -	return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
> -		guest_cpuid_has(vcpu, X86_FEATURE_AMD_STIBP) ||
> -		guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS) ||
> -		guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD));
> -}
> -
> -static inline bool guest_has_pred_cmd_msr(struct kvm_vcpu *vcpu)
> -{
> -	return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
> -		guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBPB) ||
> -		guest_cpuid_has(vcpu, X86_FEATURE_SBPB));
> -}
> -
>  static inline bool supports_cpuid_fault(struct kvm_vcpu *vcpu)
>  {
>  	return vcpu->arch.msr_platform_info & MSR_PLATFORM_INFO_CPUID_FAULT;
> @@ -301,4 +286,19 @@ static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr
>  	return kvm_vcpu_is_legal_gpa(vcpu, cr3);
>  }
>  
> +static inline bool guest_has_spec_ctrl_msr(struct kvm_vcpu *vcpu)
> +{
> +	return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
> +		guest_cpuid_has(vcpu, X86_FEATURE_AMD_STIBP) ||
> +		guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS) ||
> +		guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD));
> +}
> +
> +static inline bool guest_has_pred_cmd_msr(struct kvm_vcpu *vcpu)
> +{
> +	return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
> +		guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBPB) ||
> +		guest_cpuid_has(vcpu, X86_FEATURE_SBPB));
> +}
> +
>  #endif

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 46/49] KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_caps
  2024-05-17 17:39 ` [PATCH v2 46/49] KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_caps Sean Christopherson
@ 2024-07-05  2:34   ` Maxim Levitsky
  2024-07-09 19:20     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  2:34 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Switch all queries (except XSAVES) of guest features from guest CPUID to
> guest capabilities, i.e. replace all calls to guest_cpuid_has() with calls
> to guest_cpu_cap_has().
> 
> Keep guest_cpuid_has() around for XSAVES, but subsume its helper
> guest_cpuid_get_register() and add a compile-time assertion to prevent
> using guest_cpuid_has() for any other feature.  Add yet another comment
> for XSAVE to explain why KVM is allowed to query its raw guest CPUID.
> 
> Opportunistically drop the unused guest_cpuid_clear(), as there should be
> no circumstance in which KVM needs to _clear_ a guest CPUID feature now
> that everything is tracked via cpu_caps.  E.g. KVM may need to _change_
> a feature to emulate dynamic CPUID flags, but KVM should never need to
> clear a feature in guest CPUID to prevent it from being used by the guest.
> 
> Delete the last remnants of the governed features framework, as the lone
> holdout was vmx_adjust_secondary_exec_control()'s divergent behavior for
> governed vs. ungoverned features.
> 
> Note, replacing guest_cpuid_has() checks with guest_cpu_cap_has() when
> computing reserved CR4 bits is a nop when viewed as a whole, as KVM's
> capabilities are already incorporated into the calculation, i.e. if a
> feature is present in guest CPUID but unsupported by KVM, its CR4 bit
> was already being marked as reserved, checking guest_cpu_cap_has() simply
> double-stamps that it's a reserved bit.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c             |  4 +-
>  arch/x86/kvm/cpuid.h             | 74 +++++++++++---------------------
>  arch/x86/kvm/governed_features.h | 22 ----------
>  arch/x86/kvm/hyperv.c            |  2 +-
>  arch/x86/kvm/lapic.c             |  2 +-
>  arch/x86/kvm/mtrr.c              |  2 +-
>  arch/x86/kvm/smm.c               | 10 ++---
>  arch/x86/kvm/svm/pmu.c           |  8 ++--
>  arch/x86/kvm/svm/sev.c           |  4 +-
>  arch/x86/kvm/svm/svm.c           | 20 ++++-----
>  arch/x86/kvm/vmx/hyperv.h        |  2 +-
>  arch/x86/kvm/vmx/nested.c        | 12 +++---
>  arch/x86/kvm/vmx/pmu_intel.c     |  4 +-
>  arch/x86/kvm/vmx/sgx.c           | 14 +++---
>  arch/x86/kvm/vmx/vmx.c           | 47 ++++++++++----------
>  arch/x86/kvm/x86.c               | 64 +++++++++++++--------------
>  16 files changed, 121 insertions(+), 170 deletions(-)
>  delete mode 100644 arch/x86/kvm/governed_features.h
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 1424a9d4eb17..0130e0677387 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -463,7 +463,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	 * and can install smaller shadow pages if the host lacks 1GiB support.
>  	 */
>  	allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) :
> -				      guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES);
> +				      guest_cpu_cap_has(vcpu, X86_FEATURE_GBPAGES);
>  	guest_cpu_cap_change(vcpu, X86_FEATURE_GBPAGES, allow_gbpages);
>  
>  	best = kvm_find_cpuid_entry(vcpu, 1);
> @@ -488,7 +488,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  
>  #define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
>  	vcpu->arch.cr4_guest_rsvd_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_) |
> -					 __cr4_reserved_bits(guest_cpuid_has, vcpu);
> +					 __cr4_reserved_bits(guest_cpu_cap_has, vcpu);
>  #undef __kvm_cpu_cap_has
>  
>  	kvm_hv_set_cpuid(vcpu, kvm_cpuid_has_hyperv(vcpu));
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index 7be56fa62342..0bf3bddd0e29 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -67,41 +67,38 @@ static __always_inline void cpuid_entry_override(struct kvm_cpuid_entry2 *entry,
>  	*reg = kvm_cpu_caps[leaf];
>  }
>  
> -static __always_inline u32 *guest_cpuid_get_register(struct kvm_vcpu *vcpu,
> -						     unsigned int x86_feature)
> +static __always_inline bool guest_cpuid_has(struct kvm_vcpu *vcpu,
> +					    unsigned int x86_feature)
>  {
>  	const struct cpuid_reg cpuid = x86_feature_cpuid(x86_feature);
>  	struct kvm_cpuid_entry2 *entry;
> +	u32 *reg;
> +
> +	/*
> +	 * XSAVES is a special snowflake.  Due to lack of a dedicated intercept
> +	 * on SVM, KVM must assume that XSAVES (and thus XRSTORS) is usable by
> +	 * the guest if the host supports XSAVES and *XSAVE* is exposed to the
> +	 * guest.  Although the guest can read/write XSS via XSAVES/XRSTORS, to
> +	 * minimize the virtualization hole, KVM rejects attempts to read/write
> +	 * XSS via RDMSR/WRMSR.  To make that work, KVM needs to check the raw
> +	 * guest CPUID, not KVM's view of guest capabilities.

Hi,

I think that this comment is wrong:

The guest can't read/write XSS via XSAVES/XRSTORS. It can only use XSAVES/XRSTORS
to save/restore features that are enabled in XSS, and thus if there are none enabled,
the XSAVES/XRSTORS acts as more or less XSAVEOPTC/XRSTOR except working only when CPL=0)

So I don't think that there is a virtualization hole except the fact that VMM can't
really disable XSAVES if it chooses to.

Another "half virtualization hole" is that since we have chosen to not intercept XSAVES at all,
(AMD can't do this at all, and it's slow anyway) we instead opted to never support some XSS bits 
(so far all of them, only upcoming CET will add a few supported bits).

This creates an unexpected situation for the guest - enabled feature (e.g PT) but no XSS bit supported
to context switch it. x86 arch does allow this though.


> +	 *
> +	 * For all other features, guest capabilities are accurate.  Expand
> +	 * this allowlist with extreme vigilance.
> +	 */
> +	BUILD_BUG_ON(x86_feature != X86_FEATURE_XSAVES);
>  
>  	entry = kvm_find_cpuid_entry_index(vcpu, cpuid.function, cpuid.index);
>  	if (!entry)
>  		return NULL;
>  
> -	return __cpuid_entry_get_reg(entry, cpuid.reg);
> -}
> -
> -static __always_inline bool guest_cpuid_has(struct kvm_vcpu *vcpu,
> -					    unsigned int x86_feature)
> -{
> -	u32 *reg;
> -
> -	reg = guest_cpuid_get_register(vcpu, x86_feature);
> +	reg = __cpuid_entry_get_reg(entry, cpuid.reg);
>  	if (!reg)
>  		return false;
>  
>  	return *reg & __feature_bit(x86_feature);
>  }
>  
> -static __always_inline void guest_cpuid_clear(struct kvm_vcpu *vcpu,
> -					      unsigned int x86_feature)
> -{
> -	u32 *reg;
> -
> -	reg = guest_cpuid_get_register(vcpu, x86_feature);
> -	if (reg)
> -		*reg &= ~__feature_bit(x86_feature);
> -}
> -
>  static inline bool guest_cpuid_is_amd_or_hygon(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_cpuid_entry2 *best;
> @@ -220,27 +217,6 @@ static __always_inline bool guest_pv_has(struct kvm_vcpu *vcpu,
>  	return vcpu->arch.pv_cpuid.features & (1u << kvm_feature);
>  }
>  
> -enum kvm_governed_features {
> -#define KVM_GOVERNED_FEATURE(x) KVM_GOVERNED_##x,
> -#include "governed_features.h"
> -	KVM_NR_GOVERNED_FEATURES
> -};
> -
> -static __always_inline int kvm_governed_feature_index(unsigned int x86_feature)
> -{
> -	switch (x86_feature) {
> -#define KVM_GOVERNED_FEATURE(x) case x: return KVM_GOVERNED_##x;
> -#include "governed_features.h"
> -	default:
> -		return -1;
> -	}
> -}
> -
> -static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature)
> -{
> -	return kvm_governed_feature_index(x86_feature) >= 0;
> -}
> -
>  static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
>  					      unsigned int x86_feature)
>  {
> @@ -288,17 +264,17 @@ static inline bool kvm_vcpu_is_legal_cr3(struct kvm_vcpu *vcpu, unsigned long cr
>  
>  static inline bool guest_has_spec_ctrl_msr(struct kvm_vcpu *vcpu)
>  {
> -	return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
> -		guest_cpuid_has(vcpu, X86_FEATURE_AMD_STIBP) ||
> -		guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS) ||
> -		guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD));
> +	return (guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
> +		guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_STIBP) ||
> +		guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBRS) ||
> +		guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_SSBD));
>  }
>  
>  static inline bool guest_has_pred_cmd_msr(struct kvm_vcpu *vcpu)
>  {
> -	return (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
> -		guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBPB) ||
> -		guest_cpuid_has(vcpu, X86_FEATURE_SBPB));
> +	return (guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) ||
> +		guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBPB) ||
> +		guest_cpu_cap_has(vcpu, X86_FEATURE_SBPB));
>  }
>  
>  #endif
> diff --git a/arch/x86/kvm/governed_features.h b/arch/x86/kvm/governed_features.h
> deleted file mode 100644
> index ad463b1ed4e4..000000000000
> --- a/arch/x86/kvm/governed_features.h
> +++ /dev/null
> @@ -1,22 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -#if !defined(KVM_GOVERNED_FEATURE) || defined(KVM_GOVERNED_X86_FEATURE)
> -BUILD_BUG()
> -#endif
> -
> -#define KVM_GOVERNED_X86_FEATURE(x) KVM_GOVERNED_FEATURE(X86_FEATURE_##x)
> -
> -KVM_GOVERNED_X86_FEATURE(GBPAGES)
> -KVM_GOVERNED_X86_FEATURE(XSAVES)
> -KVM_GOVERNED_X86_FEATURE(VMX)
> -KVM_GOVERNED_X86_FEATURE(NRIPS)
> -KVM_GOVERNED_X86_FEATURE(TSCRATEMSR)
> -KVM_GOVERNED_X86_FEATURE(V_VMSAVE_VMLOAD)
> -KVM_GOVERNED_X86_FEATURE(LBRV)
> -KVM_GOVERNED_X86_FEATURE(PAUSEFILTER)
> -KVM_GOVERNED_X86_FEATURE(PFTHRESHOLD)
> -KVM_GOVERNED_X86_FEATURE(VGIF)
> -KVM_GOVERNED_X86_FEATURE(VNMI)
> -KVM_GOVERNED_X86_FEATURE(LAM)
> -
> -#undef KVM_GOVERNED_X86_FEATURE
> -#undef KVM_GOVERNED_FEATURE
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 8a47f8541eab..4971b60a1882 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -1352,7 +1352,7 @@ static void __kvm_hv_xsaves_xsavec_maybe_warn(struct kvm_vcpu *vcpu)
>  		return;
>  
>  	if (guest_cpuid_has(vcpu, X86_FEATURE_XSAVES) ||
> -	    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVEC))
> +	    !guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVEC))
>  		return;
>  
>  	pr_notice_ratelimited("Booting SMP Windows KVM VM with !XSAVES && XSAVEC. "
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index ebf41023be38..37a2ecee3d75 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -590,7 +590,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
>  	 * version first and level-triggered interrupts never get EOIed in
>  	 * IOAPIC.
>  	 */
> -	if (guest_cpuid_has(vcpu, X86_FEATURE_X2APIC) &&
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) &&
>  	    !ioapic_in_kernel(vcpu->kvm))
>  		v |= APIC_LVR_DIRECTED_EOI;
>  	kvm_lapic_set_reg(apic, APIC_LVR, v);
> diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
> index a67c28a56417..9e8cb38ae1db 100644
> --- a/arch/x86/kvm/mtrr.c
> +++ b/arch/x86/kvm/mtrr.c
> @@ -128,7 +128,7 @@ static u8 mtrr_disabled_type(struct kvm_vcpu *vcpu)
>  	 * enable MTRRs and it is obviously undesirable to run the
>  	 * guest entirely with UC memory and we use WB.
>  	 */
> -	if (guest_cpuid_has(vcpu, X86_FEATURE_MTRR))
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_MTRR))
>  		return MTRR_TYPE_UNCACHABLE;
>  	else
>  		return MTRR_TYPE_WRBACK;
> diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
> index d06d43d8d2aa..9144b28789df 100644
> --- a/arch/x86/kvm/smm.c
> +++ b/arch/x86/kvm/smm.c
> @@ -283,7 +283,7 @@ void enter_smm(struct kvm_vcpu *vcpu)
>  	memset(smram.bytes, 0, sizeof(smram.bytes));
>  
>  #ifdef CONFIG_X86_64
> -	if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
>  		enter_smm_save_state_64(vcpu, &smram.smram64);
>  	else
>  #endif
> @@ -353,7 +353,7 @@ void enter_smm(struct kvm_vcpu *vcpu)
>  	kvm_set_segment(vcpu, &ds, VCPU_SREG_SS);
>  
>  #ifdef CONFIG_X86_64
> -	if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
>  		if (static_call(kvm_x86_set_efer)(vcpu, 0))
>  			goto error;
>  #endif
> @@ -586,7 +586,7 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
>  	 * supports long mode.
>  	 */
>  #ifdef CONFIG_X86_64
> -	if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) {
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_LM)) {
>  		struct kvm_segment cs_desc;
>  		unsigned long cr4;
>  
> @@ -609,7 +609,7 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
>  		kvm_set_cr0(vcpu, cr0 & ~(X86_CR0_PG | X86_CR0_PE));
>  
>  #ifdef CONFIG_X86_64
> -	if (guest_cpuid_has(vcpu, X86_FEATURE_LM)) {
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_LM)) {
>  		unsigned long cr4, efer;
>  
>  		/* Clear CR4.PAE before clearing EFER.LME. */
> @@ -632,7 +632,7 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)
>  		return X86EMUL_UNHANDLEABLE;
>  
>  #ifdef CONFIG_X86_64
> -	if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
>  		return rsm_load_state_64(ctxt, &smram.smram64);
>  	else
>  #endif
> diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
> index dfcc38bd97d3..4a4be2da1345 100644
> --- a/arch/x86/kvm/svm/pmu.c
> +++ b/arch/x86/kvm/svm/pmu.c
> @@ -46,7 +46,7 @@ static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
>  
>  	switch (msr) {
>  	case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5:
> -		if (!guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE))
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PERFCTR_CORE))
>  			return NULL;
>  		/*
>  		 * Each PMU counter has a pair of CTL and CTR MSRs. CTLn
> @@ -109,7 +109,7 @@ static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
>  	case MSR_K7_EVNTSEL0 ... MSR_K7_PERFCTR3:
>  		return pmu->version > 0;
>  	case MSR_F15H_PERF_CTL0 ... MSR_F15H_PERF_CTR5:
> -		return guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE);
> +		return guest_cpu_cap_has(vcpu, X86_FEATURE_PERFCTR_CORE);
>  	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS:
>  	case MSR_AMD64_PERF_CNTR_GLOBAL_CTL:
>  	case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR:
> @@ -179,7 +179,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
>  	union cpuid_0x80000022_ebx ebx;
>  
>  	pmu->version = 1;
> -	if (guest_cpuid_has(vcpu, X86_FEATURE_PERFMON_V2)) {
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_PERFMON_V2)) {
>  		pmu->version = 2;
>  		/*
>  		 * Note, PERFMON_V2 is also in 0x80000022.0x0, i.e. the guest
> @@ -189,7 +189,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
>  			     x86_feature_cpuid(X86_FEATURE_PERFMON_V2).index);
>  		ebx.full = kvm_find_cpuid_entry_index(vcpu, 0x80000022, 0)->ebx;
>  		pmu->nr_arch_gp_counters = ebx.split.num_core_pmc;
> -	} else if (guest_cpuid_has(vcpu, X86_FEATURE_PERFCTR_CORE)) {
> +	} else if (guest_cpu_cap_has(vcpu, X86_FEATURE_PERFCTR_CORE)) {
>  		pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS_CORE;
>  	} else {
>  		pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS;
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 7640dedc2ddc..1004280599b4 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -4399,8 +4399,8 @@ static void sev_es_vcpu_after_set_cpuid(struct vcpu_svm *svm)
>  	struct kvm_vcpu *vcpu = &svm->vcpu;
>  
>  	if (boot_cpu_has(X86_FEATURE_V_TSC_AUX)) {
> -		bool v_tsc_aux = guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) ||
> -				 guest_cpuid_has(vcpu, X86_FEATURE_RDPID);
> +		bool v_tsc_aux = guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) ||
> +				 guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID);
>  
>  		set_msr_interception(vcpu, svm->msrpm, MSR_TSC_AUX, v_tsc_aux, v_tsc_aux);
>  	}
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 946a75771946..06770b60c0ba 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1178,14 +1178,14 @@ static void svm_recalc_instruction_intercepts(struct kvm_vcpu *vcpu,
>  	 */
>  	if (kvm_cpu_cap_has(X86_FEATURE_INVPCID)) {
>  		if (!npt_enabled ||
> -		    !guest_cpuid_has(&svm->vcpu, X86_FEATURE_INVPCID))
> +		    !guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_INVPCID))
>  			svm_set_intercept(svm, INTERCEPT_INVPCID);
>  		else
>  			svm_clr_intercept(svm, INTERCEPT_INVPCID);
>  	}
>  
>  	if (kvm_cpu_cap_has(X86_FEATURE_RDTSCP)) {
> -		if (guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> +		if (guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP))
>  			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
>  		else
>  			svm_set_intercept(svm, INTERCEPT_RDTSCP);
> @@ -2911,7 +2911,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		break;
>  	case MSR_AMD64_VIRT_SPEC_CTRL:
>  		if (!msr_info->host_initiated &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_VIRT_SSBD))
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_VIRT_SSBD))
>  			return 1;
>  
>  		msr_info->data = svm->virt_spec_ctrl;
> @@ -3058,7 +3058,7 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
>  		break;
>  	case MSR_AMD64_VIRT_SPEC_CTRL:
>  		if (!msr->host_initiated &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_VIRT_SSBD))
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_VIRT_SSBD))
>  			return 1;
>  
>  		if (data & ~SPEC_CTRL_SSBD)
> @@ -3230,7 +3230,7 @@ static int invpcid_interception(struct kvm_vcpu *vcpu)
>  	unsigned long type;
>  	gva_t gva;
>  
> -	if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_INVPCID)) {
>  		kvm_queue_exception(vcpu, UD_VECTOR);
>  		return 1;
>  	}
> @@ -4342,7 +4342,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES,
>  			     boot_cpu_has(X86_FEATURE_XSAVE) &&
>  			     boot_cpu_has(X86_FEATURE_XSAVES) &&
> -			     guest_cpuid_has(vcpu, X86_FEATURE_XSAVE));
> +			     guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE));
>  
>  	/*
>  	 * Intercept VMLOAD if the vCPU mode is Intel in order to emulate that
> @@ -4360,7 +4360,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  
>  	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
>  		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0,
> -				     !!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
> +				     !!guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
>  
>  	if (sev_guest(vcpu->kvm))
>  		sev_vcpu_after_set_cpuid(svm);
> @@ -4617,7 +4617,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
>  	 * responsible for ensuring nested SVM and SMIs are mutually exclusive.
>  	 */
>  
> -	if (!guest_cpuid_has(vcpu, X86_FEATURE_LM))
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
>  		return 1;
>  
>  	smram->smram64.svm_guest_flag = 1;
> @@ -4664,14 +4664,14 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const union kvm_smram *smram)
>  
>  	const struct kvm_smram_state_64 *smram64 = &smram->smram64;
>  
> -	if (!guest_cpuid_has(vcpu, X86_FEATURE_LM))
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
>  		return 0;
>  
>  	/* Non-zero if SMI arrived while vCPU was in guest mode. */
>  	if (!smram64->svm_guest_flag)
>  		return 0;
>  
> -	if (!guest_cpuid_has(vcpu, X86_FEATURE_SVM))
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SVM))
>  		return 1;
>  
>  	if (!(smram64->efer & EFER_SVME))
> diff --git a/arch/x86/kvm/vmx/hyperv.h b/arch/x86/kvm/vmx/hyperv.h
> index a87407412615..11a339009781 100644
> --- a/arch/x86/kvm/vmx/hyperv.h
> +++ b/arch/x86/kvm/vmx/hyperv.h
> @@ -42,7 +42,7 @@ static inline struct hv_enlightened_vmcs *nested_vmx_evmcs(struct vcpu_vmx *vmx)
>  	return vmx->nested.hv_evmcs;
>  }
>  
> -static inline bool guest_cpuid_has_evmcs(struct kvm_vcpu *vcpu)
> +static inline bool guest_cpu_cap_has_evmcs(struct kvm_vcpu *vcpu)
>  {
>  	/*
>  	 * eVMCS is exposed to the guest if Hyper-V is enabled in CPUID and
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index fb7eec29681d..fcba0061083d 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -259,7 +259,7 @@ static bool nested_evmcs_handle_vmclear(struct kvm_vcpu *vcpu, gpa_t vmptr)
>  	 * state. It is possible that the area will stay mapped as
>  	 * vmx->nested.hv_evmcs but this shouldn't be a problem.
>  	 */
> -	if (!guest_cpuid_has_evmcs(vcpu) ||
> +	if (!guest_cpu_cap_has_evmcs(vcpu) ||
>  	    !evmptr_is_valid(nested_get_evmptr(vcpu)))
>  		return false;
>  
> @@ -2061,7 +2061,7 @@ static enum nested_evmptrld_status nested_vmx_handle_enlightened_vmptrld(
>  	bool evmcs_gpa_changed = false;
>  	u64 evmcs_gpa;
>  
> -	if (likely(!guest_cpuid_has_evmcs(vcpu)))
> +	if (likely(!guest_cpu_cap_has_evmcs(vcpu)))
>  		return EVMPTRLD_DISABLED;
>  
>  	evmcs_gpa = nested_get_evmptr(vcpu);
> @@ -2947,7 +2947,7 @@ static int nested_vmx_check_controls(struct kvm_vcpu *vcpu,
>  		return -EINVAL;
>  
>  #ifdef CONFIG_KVM_HYPERV
> -	if (guest_cpuid_has_evmcs(vcpu))
> +	if (guest_cpu_cap_has_evmcs(vcpu))
>  		return nested_evmcs_check_controls(vmcs12);
>  #endif
>  
> @@ -3231,7 +3231,7 @@ static bool nested_get_evmcs_page(struct kvm_vcpu *vcpu)
>  	 * L2 was running), map it here to make sure vmcs12 changes are
>  	 * properly reflected.
>  	 */
> -	if (guest_cpuid_has_evmcs(vcpu) &&
> +	if (guest_cpu_cap_has_evmcs(vcpu) &&
>  	    vmx->nested.hv_evmcs_vmptr == EVMPTR_MAP_PENDING) {
>  		enum nested_evmptrld_status evmptrld_status =
>  			nested_vmx_handle_enlightened_vmptrld(vcpu, false);
> @@ -4882,7 +4882,7 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
>  	 * doesn't isolate different VMCSs, i.e. in this case, doesn't provide
>  	 * separate modes for L2 vs L1.
>  	 */
> -	if (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL))
>  		indirect_branch_prediction_barrier();
>  
>  	/* Update any VMCS fields that might have changed while L2 ran */
> @@ -6152,7 +6152,7 @@ static bool nested_vmx_exit_handled_encls(struct kvm_vcpu *vcpu,
>  {
>  	u32 encls_leaf;
>  
> -	if (!guest_cpuid_has(vcpu, X86_FEATURE_SGX) ||
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SGX) ||
>  	    !nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENCLS_EXITING))
>  		return false;
>  
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index be40474de6e4..a739defa6796 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -110,7 +110,7 @@ static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
>  
>  static inline u64 vcpu_get_perf_capabilities(struct kvm_vcpu *vcpu)
>  {
> -	if (!guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
>  		return 0;
>  
>  	return vcpu->arch.perf_capabilities;
> @@ -160,7 +160,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
>  		ret = vcpu_get_perf_capabilities(vcpu) & PERF_CAP_PEBS_FORMAT;
>  		break;
>  	case MSR_IA32_DS_AREA:
> -		ret = guest_cpuid_has(vcpu, X86_FEATURE_DS);
> +		ret = guest_cpu_cap_has(vcpu, X86_FEATURE_DS);
>  		break;
>  	case MSR_PEBS_DATA_CFG:
>  		perf_capabilities = vcpu_get_perf_capabilities(vcpu);
> diff --git a/arch/x86/kvm/vmx/sgx.c b/arch/x86/kvm/vmx/sgx.c
> index 6fef01e0536e..f57f072a16f6 100644
> --- a/arch/x86/kvm/vmx/sgx.c
> +++ b/arch/x86/kvm/vmx/sgx.c
> @@ -123,7 +123,7 @@ static int sgx_inject_fault(struct kvm_vcpu *vcpu, gva_t gva, int trapnr)
>  	 * likely than a bad userspace address.
>  	 */
>  	if ((trapnr == PF_VECTOR || !boot_cpu_has(X86_FEATURE_SGX2)) &&
> -	    guest_cpuid_has(vcpu, X86_FEATURE_SGX2)) {
> +	    guest_cpu_cap_has(vcpu, X86_FEATURE_SGX2)) {
>  		memset(&ex, 0, sizeof(ex));
>  		ex.vector = PF_VECTOR;
>  		ex.error_code = PFERR_PRESENT_MASK | PFERR_WRITE_MASK |
> @@ -366,7 +366,7 @@ static inline bool encls_leaf_enabled_in_guest(struct kvm_vcpu *vcpu, u32 leaf)
>  		return true;
>  
>  	if (leaf >= EAUG && leaf <= EMODT)
> -		return guest_cpuid_has(vcpu, X86_FEATURE_SGX2);
> +		return guest_cpu_cap_has(vcpu, X86_FEATURE_SGX2);
>  
>  	return false;
>  }
> @@ -382,8 +382,8 @@ int handle_encls(struct kvm_vcpu *vcpu)
>  {
>  	u32 leaf = (u32)kvm_rax_read(vcpu);
>  
> -	if (!enable_sgx || !guest_cpuid_has(vcpu, X86_FEATURE_SGX) ||
> -	    !guest_cpuid_has(vcpu, X86_FEATURE_SGX1)) {
> +	if (!enable_sgx || !guest_cpu_cap_has(vcpu, X86_FEATURE_SGX) ||
> +	    !guest_cpu_cap_has(vcpu, X86_FEATURE_SGX1)) {
>  		kvm_queue_exception(vcpu, UD_VECTOR);
>  	} else if (!encls_leaf_enabled_in_guest(vcpu, leaf) ||
>  		   !sgx_enabled_in_guest_bios(vcpu) || !is_paging(vcpu)) {
> @@ -480,15 +480,15 @@ void vmx_write_encls_bitmap(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
>  	if (!cpu_has_vmx_encls_vmexit())
>  		return;
>  
> -	if (guest_cpuid_has(vcpu, X86_FEATURE_SGX) &&
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX) &&
>  	    sgx_enabled_in_guest_bios(vcpu)) {
> -		if (guest_cpuid_has(vcpu, X86_FEATURE_SGX1)) {
> +		if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX1)) {
>  			bitmap &= ~GENMASK_ULL(ETRACK, ECREATE);
>  			if (sgx_intercept_encls_ecreate(vcpu))
>  				bitmap |= (1 << ECREATE);
>  		}
>  
> -		if (guest_cpuid_has(vcpu, X86_FEATURE_SGX2))
> +		if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX2))
>  			bitmap &= ~GENMASK_ULL(EMODT, EAUG);
>  
>  		/*
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 653c4b68ec7f..741961a1edcc 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1874,8 +1874,8 @@ static void vmx_setup_uret_msrs(struct vcpu_vmx *vmx)
>  	vmx_setup_uret_msr(vmx, MSR_EFER, update_transition_efer(vmx));
>  
>  	vmx_setup_uret_msr(vmx, MSR_TSC_AUX,
> -			   guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
> -			   guest_cpuid_has(&vmx->vcpu, X86_FEATURE_RDPID));
> +			   guest_cpu_cap_has(&vmx->vcpu, X86_FEATURE_RDTSCP) ||
> +			   guest_cpu_cap_has(&vmx->vcpu, X86_FEATURE_RDPID));
>  
>  	/*
>  	 * hle=0, rtm=0, tsx_ctrl=1 can be found with some combinations of new
> @@ -2028,7 +2028,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_IA32_BNDCFGS:
>  		if (!kvm_mpx_supported() ||
>  		    (!msr_info->host_initiated &&
> -		     !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
> +		     !guest_cpu_cap_has(vcpu, X86_FEATURE_MPX)))
>  			return 1;
>  		msr_info->data = vmcs_read64(GUEST_BNDCFGS);
>  		break;
> @@ -2044,7 +2044,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		break;
>  	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
>  		if (!msr_info->host_initiated &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC))
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC))
>  			return 1;
>  		msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash
>  			[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
> @@ -2063,7 +2063,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		 * sanity checking and refuse to boot. Filter all unsupported
>  		 * features out.
>  		 */
> -		if (!msr_info->host_initiated && guest_cpuid_has_evmcs(vcpu))
> +		if (!msr_info->host_initiated && guest_cpu_cap_has_evmcs(vcpu))
>  			nested_evmcs_filter_control_msr(vcpu, msr_info->index,
>  							&msr_info->data);
>  #endif
> @@ -2133,7 +2133,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
>  						    u64 data)
>  {
>  #ifdef CONFIG_X86_64
> -	if (!guest_cpuid_has(vcpu, X86_FEATURE_LM))
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
>  		return (u32)data;
>  #endif
>  	return (unsigned long)data;
> @@ -2144,7 +2144,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
>  	u64 debugctl = 0;
>  
>  	if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) &&
> -	    (host_initiated || guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT)))
> +	    (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT)))
>  		debugctl |= DEBUGCTLMSR_BUS_LOCK_DETECT;
>  
>  	if ((kvm_caps.supported_perf_cap & PMU_CAP_LBR_FMT) &&
> @@ -2248,7 +2248,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_IA32_BNDCFGS:
>  		if (!kvm_mpx_supported() ||
>  		    (!msr_info->host_initiated &&
> -		     !guest_cpuid_has(vcpu, X86_FEATURE_MPX)))
> +		     !guest_cpu_cap_has(vcpu, X86_FEATURE_MPX)))
>  			return 1;
>  		if (is_noncanonical_address(data & PAGE_MASK, vcpu) ||
>  		    (data & MSR_IA32_BNDCFGS_RSVD))
> @@ -2350,7 +2350,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		 * behavior, but it's close enough.
>  		 */
>  		if (!msr_info->host_initiated &&
> -		    (!guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC) ||
> +		    (!guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC) ||
>  		    ((vmx->msr_ia32_feature_control & FEAT_CTL_LOCKED) &&
>  		    !(vmx->msr_ia32_feature_control & FEAT_CTL_SGX_LC_ENABLED))))
>  			return 1;
> @@ -2436,9 +2436,9 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  			if ((data & PERF_CAP_PEBS_MASK) !=
>  			    (kvm_caps.supported_perf_cap & PERF_CAP_PEBS_MASK))
>  				return 1;
> -			if (!guest_cpuid_has(vcpu, X86_FEATURE_DS))
> +			if (!guest_cpu_cap_has(vcpu, X86_FEATURE_DS))
>  				return 1;
> -			if (!guest_cpuid_has(vcpu, X86_FEATURE_DTES64))
> +			if (!guest_cpu_cap_has(vcpu, X86_FEATURE_DTES64))
>  				return 1;
>  			if (!cpuid_model_is_consistent(vcpu))
>  				return 1;
> @@ -4570,10 +4570,7 @@ vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
>  	bool __enabled;										\
>  												\
>  	if (cpu_has_vmx_##name()) {								\
> -		if (kvm_is_governed_feature(X86_FEATURE_##feat_name))				\
> -			__enabled = guest_cpu_cap_has(__vcpu, X86_FEATURE_##feat_name);		\
> -		else										\
> -			__enabled = guest_cpuid_has(__vcpu, X86_FEATURE_##feat_name);		\
> +		__enabled = guest_cpu_cap_has(__vcpu, X86_FEATURE_##feat_name);			\
>  		vmx_adjust_secondary_exec_control(vmx, exec_control, SECONDARY_EXEC_##ctrl_name,\
>  						  __enabled, exiting);				\
>  	}											\
> @@ -4649,8 +4646,8 @@ static u32 vmx_secondary_exec_control(struct vcpu_vmx *vmx)
>  	 */
>  	if (cpu_has_vmx_rdtscp()) {
>  		bool rdpid_or_rdtscp_enabled =
> -			guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) ||
> -			guest_cpuid_has(vcpu, X86_FEATURE_RDPID);
> +			guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) ||
> +			guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID);
>  
>  		vmx_adjust_secondary_exec_control(vmx, &exec_control,
>  						  SECONDARY_EXEC_ENABLE_RDTSCP,
> @@ -5956,7 +5953,7 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
>  	} operand;
>  	int gpr_index;
>  
> -	if (!guest_cpuid_has(vcpu, X86_FEATURE_INVPCID)) {
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_INVPCID)) {
>  		kvm_queue_exception(vcpu, UD_VECTOR);
>  		return 1;
>  	}
> @@ -7837,7 +7834,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	 * set if and only if XSAVE is supported.
>  	 */
>  	if (!boot_cpu_has(X86_FEATURE_XSAVE) ||
> -	    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVE))
> +	    !guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE))
>  		guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES);
>  
>  	vmx_setup_uret_msrs(vmx);
> @@ -7859,21 +7856,21 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  		nested_vmx_cr_fixed1_bits_update(vcpu);
>  
>  	if (boot_cpu_has(X86_FEATURE_INTEL_PT) &&
> -			guest_cpuid_has(vcpu, X86_FEATURE_INTEL_PT))
> +			guest_cpu_cap_has(vcpu, X86_FEATURE_INTEL_PT))
>  		update_intel_pt_cfg(vcpu);
>  (on Intel we have an bitmask of features that we 
>  	if (boot_cpu_has(X86_FEATURE_RTM)) {
>  		struct vmx_uret_msr *msr;
>  		msr = vmx_find_uret_msr(vmx, MSR_IA32_TSX_CTRL);
>  		if (msr) {
> -			bool enabled = guest_cpuid_has(vcpu, X86_FEATURE_RTM);
> +			bool enabled = guest_cpu_cap_has(vcpu, X86_FEATURE_RTM);
>  			vmx_set_guest_uret_msr(vmx, msr, enabled ? 0 : TSX_CTRL_RTM_DISABLE);
>  		}
>  	}
>  
>  	if (kvm_cpu_cap_has(X86_FEATURE_XFD))
>  		vmx_set_intercept_for_msr(vcpu, MSR_IA32_XFD_ERR, MSR_TYPE_R,
> -					  !guest_cpuid_has(vcpu, X86_FEATURE_XFD));
> +					  !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD));
>  
>  	if (boot_cpu_has(X86_FEATURE_IBPB))
>  		vmx_set_intercept_for_msr(vcpu, MSR_IA32_PRED_CMD, MSR_TYPE_W,
> @@ -7881,17 +7878,17 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  
>  	if (boot_cpu_has(X86_FEATURE_FLUSH_L1D))
>  		vmx_set_intercept_for_msr(vcpu, MSR_IA32_FLUSH_CMD, MSR_TYPE_W,
> -					  !guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
> +					  !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D));
>  
>  	set_cr4_guest_host_mask(vmx);
>  
>  	vmx_write_encls_bitmap(vcpu, NULL);
> -	if (guest_cpuid_has(vcpu, X86_FEATURE_SGX))
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX))
>  		vmx->msr_ia32_feature_control_valid_bits |= FEAT_CTL_SGX_ENABLED;
>  	else
>  		vmx->msr_ia32_feature_control_valid_bits &= ~FEAT_CTL_SGX_ENABLED;
>  
> -	if (guest_cpuid_has(vcpu, X86_FEATURE_SGX_LC))
> +	if (guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC))
>  		vmx->msr_ia32_feature_control_valid_bits |=
>  			FEAT_CTL_SGX_LC_ENABLED;
>  	else
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4ca9651b3f43..5aa7581802f7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -488,7 +488,7 @@ int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	enum lapic_mode old_mode = kvm_get_apic_mode(vcpu);
>  	enum lapic_mode new_mode = kvm_apic_mode(msr_info->data);
>  	u64 reserved_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu) | 0x2ff |
> -		(guest_cpuid_has(vcpu, X86_FEATURE_X2APIC) ? 0 : X2APIC_ENABLE);
> +		(guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) ? 0 : X2APIC_ENABLE);
>  
>  	if ((msr_info->data & reserved_bits) != 0 || new_mode == LAPIC_MODE_INVALID)
>  		return 1;
> @@ -1351,10 +1351,10 @@ static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu)
>  {
>  	u64 fixed = DR6_FIXED_1;
>  
> -	if (!guest_cpuid_has(vcpu, X86_FEATURE_RTM))
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_RTM))
>  		fixed |= DR6_RTM;
>  
> -	if (!guest_cpuid_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_BUS_LOCK_DETECT))
>  		fixed |= DR6_BUS_LOCK;
>  	return fixed;
>  }
> @@ -1708,20 +1708,20 @@ static int do_get_msr_feature(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
>  
>  static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
>  {
> -	if (efer & EFER_AUTOIBRS && !guest_cpuid_has(vcpu, X86_FEATURE_AUTOIBRS))
> +	if (efer & EFER_AUTOIBRS && !guest_cpu_cap_has(vcpu, X86_FEATURE_AUTOIBRS))
>  		return false;
>  
> -	if (efer & EFER_FFXSR && !guest_cpuid_has(vcpu, X86_FEATURE_FXSR_OPT))
> +	if (efer & EFER_FFXSR && !guest_cpu_cap_has(vcpu, X86_FEATURE_FXSR_OPT))
>  		return false;
>  
> -	if (efer & EFER_SVME && !guest_cpuid_has(vcpu, X86_FEATURE_SVM))
> +	if (efer & EFER_SVME && !guest_cpu_cap_has(vcpu, X86_FEATURE_SVM))
>  		return false;
>  
>  	if (efer & (EFER_LME | EFER_LMA) &&
> -	    !guest_cpuid_has(vcpu, X86_FEATURE_LM))
> +	    !guest_cpu_cap_has(vcpu, X86_FEATURE_LM))
>  		return false;
>  
> -	if (efer & EFER_NX && !guest_cpuid_has(vcpu, X86_FEATURE_NX))
> +	if (efer & EFER_NX && !guest_cpu_cap_has(vcpu, X86_FEATURE_NX))
>  		return false;
>  
>  	return true;
> @@ -1863,8 +1863,8 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
>  			return 1;
>  
>  		if (!host_initiated &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
>  			return 1;
>  
>  		/*
> @@ -1920,8 +1920,8 @@ int __kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data,
>  			return 1;
>  
>  		if (!host_initiated &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP) &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_RDPID))
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_RDTSCP) &&
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_RDPID))
>  			return 1;
>  		break;
>  	}
> @@ -2113,7 +2113,7 @@ EXPORT_SYMBOL_GPL(kvm_handle_invalid_op);
>  static int kvm_emulate_monitor_mwait(struct kvm_vcpu *vcpu, const char *insn)
>  {
>  	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS) &&
> -	    !guest_cpuid_has(vcpu, X86_FEATURE_MWAIT))
> +	    !guest_cpu_cap_has(vcpu, X86_FEATURE_MWAIT))
>  		return kvm_handle_invalid_op(vcpu);
>  
>  	pr_warn_once("%s instruction emulated as NOP!\n", insn);
> @@ -3820,11 +3820,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  			if ((!guest_has_pred_cmd_msr(vcpu)))
>  				return 1;
>  
> -			if (!guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL) &&
> -			    !guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBPB))
> +			if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SPEC_CTRL) &&
> +			    !guest_cpu_cap_has(vcpu, X86_FEATURE_AMD_IBPB))
>  				reserved_bits |= PRED_CMD_IBPB;
>  
> -			if (!guest_cpuid_has(vcpu, X86_FEATURE_SBPB))
> +			if (!guest_cpu_cap_has(vcpu, X86_FEATURE_SBPB))
>  				reserved_bits |= PRED_CMD_SBPB;
>  		}
>  
> @@ -3845,7 +3845,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	}
>  	case MSR_IA32_FLUSH_CMD:
>  		if (!msr_info->host_initiated &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D))
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_FLUSH_L1D))
>  			return 1;
>  
>  		if (!boot_cpu_has(X86_FEATURE_FLUSH_L1D) || (data & ~L1D_FLUSH))
> @@ -3896,7 +3896,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		kvm_set_lapic_tscdeadline_msr(vcpu, data);
>  		break;
>  	case MSR_IA32_TSC_ADJUST:
> -		if (guest_cpuid_has(vcpu, X86_FEATURE_TSC_ADJUST)) {
> +		if (guest_cpu_cap_has(vcpu, X86_FEATURE_TSC_ADJUST)) {
>  			if (!msr_info->host_initiated) {
>  				s64 adj = data - vcpu->arch.ia32_tsc_adjust_msr;
>  				adjust_tsc_offset_guest(vcpu, adj);
> @@ -3923,7 +3923,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  
>  		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
>  		    ((old_val ^ data)  & MSR_IA32_MISC_ENABLE_MWAIT)) {
> -			if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
> +			if (!guest_cpu_cap_has(vcpu, X86_FEATURE_XMM3))
>  				return 1;
>  			vcpu->arch.ia32_misc_enable_msr = data;
>  			kvm_update_cpuid_runtime(vcpu);
> @@ -4100,12 +4100,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		kvm_pr_unimpl_wrmsr(vcpu, msr, data);
>  		break;
>  	case MSR_AMD64_OSVW_ID_LENGTH:
> -		if (!guest_cpuid_has(vcpu, X86_FEATURE_OSVW))
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
>  			return 1;
>  		vcpu->arch.osvw.length = data;
>  		break;
>  	case MSR_AMD64_OSVW_STATUS:
> -		if (!guest_cpuid_has(vcpu, X86_FEATURE_OSVW))
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
>  			return 1;
>  		vcpu->arch.osvw.status = data;
>  		break;
> @@ -4126,7 +4126,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  #ifdef CONFIG_X86_64
>  	case MSR_IA32_XFD:
>  		if (!msr_info->host_initiated &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_XFD))
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
>  			return 1;
>  
>  		if (data & ~kvm_guest_supported_xfd(vcpu))
> @@ -4136,7 +4136,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		break;
>  	case MSR_IA32_XFD_ERR:
>  		if (!msr_info->host_initiated &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_XFD))
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
>  			return 1;
>  
>  		if (data & ~kvm_guest_supported_xfd(vcpu))
> @@ -4260,13 +4260,13 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		break;
>  	case MSR_IA32_ARCH_CAPABILITIES:
>  		if (!msr_info->host_initiated &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES))
>  			return 1;
>  		msr_info->data = vcpu->arch.arch_capabilities;
>  		break;
>  	case MSR_IA32_PERF_CAPABILITIES:
>  		if (!msr_info->host_initiated &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_PDCM))
>  			return 1;
>  		msr_info->data = vcpu->arch.perf_capabilities;
>  		break;
> @@ -4467,12 +4467,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		msr_info->data = 0xbe702111;
>  		break;
>  	case MSR_AMD64_OSVW_ID_LENGTH:
> -		if (!guest_cpuid_has(vcpu, X86_FEATURE_OSVW))
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
>  			return 1;
>  		msr_info->data = vcpu->arch.osvw.length;
>  		break;
>  	case MSR_AMD64_OSVW_STATUS:
> -		if (!guest_cpuid_has(vcpu, X86_FEATURE_OSVW))
> +		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_OSVW))
>  			return 1;
>  		msr_info->data = vcpu->arch.osvw.status;
>  		break;
> @@ -4491,14 +4491,14 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  #ifdef CONFIG_X86_64
>  	case MSR_IA32_XFD:
>  		if (!msr_info->host_initiated &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_XFD))
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
>  			return 1;
>  
>  		msr_info->data = vcpu->arch.guest_fpu.fpstate->xfd;
>  		break;
>  	case MSR_IA32_XFD_ERR:
>  		if (!msr_info->host_initiated &&
> -		    !guest_cpuid_has(vcpu, X86_FEATURE_XFD))
> +		    !guest_cpu_cap_has(vcpu, X86_FEATURE_XFD))
>  			return 1;
>  
>  		msr_info->data = vcpu->arch.guest_fpu.xfd_err;
> @@ -8508,17 +8508,17 @@ static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt,
>  
>  static bool emulator_guest_has_movbe(struct x86_emulate_ctxt *ctxt)
>  {
> -	return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_MOVBE);
> +	return guest_cpu_cap_has(emul_to_vcpu(ctxt), X86_FEATURE_MOVBE);
>  }
>  
>  static bool emulator_guest_has_fxsr(struct x86_emulate_ctxt *ctxt)
>  {
> -	return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_FXSR);
> +	return guest_cpu_cap_has(emul_to_vcpu(ctxt), X86_FEATURE_FXSR);
>  }
>  
>  static bool emulator_guest_has_rdpid(struct x86_emulate_ctxt *ctxt)
>  {
> -	return guest_cpuid_has(emul_to_vcpu(ctxt), X86_FEATURE_RDPID);
> +	return guest_cpu_cap_has(emul_to_vcpu(ctxt), X86_FEATURE_RDPID);
>  }
>  
>  static ulong emulator_read_gpr(struct x86_emulate_ctxt *ctxt, unsigned reg)


The patch is large so I might have missed something but overall it looks good.

With the comment fixed:

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>


Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 47/49] KVM: x86: Drop superfluous host XSAVE check when adjusting guest XSAVES caps
  2024-05-17 17:39 ` [PATCH v2 47/49] KVM: x86: Drop superfluous host XSAVE check when adjusting guest XSAVES caps Sean Christopherson
@ 2024-07-05  2:36   ` Maxim Levitsky
  2024-07-09 19:15     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  2:36 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Drop the manual boot_cpu_has() checks on XSAVE when adjusting the guest's
> XSAVES capabilities now that guest cpu_caps incorporates KVM's support.
> The guest's cpu_caps are initialized from kvm_cpu_caps, which are in turn
> initialized from boot_cpu_data, i.e. checking guest_cpu_cap_has() also
> checks host/KVM capabilities (which is the entire point of cpu_caps).
> 
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/svm.c | 1 -
>  arch/x86/kvm/vmx/vmx.c | 3 +--
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 06770b60c0ba..4aaffbf22531 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4340,7 +4340,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	 * the guest read/write access to the host's XSS.
>  	 */
>  	guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES,
> -			     boot_cpu_has(X86_FEATURE_XSAVE) &&
>  			     boot_cpu_has(X86_FEATURE_XSAVES) &&
>  			     guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE));

>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 741961a1edcc..6fbdf520c58b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7833,8 +7833,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	 * to the guest.  XSAVES depends on CR4.OSXSAVE, and CR4.OSXSAVE can be
>  	 * set if and only if XSAVE is supported.
>  	 */


> -	if (!boot_cpu_has(X86_FEATURE_XSAVE) ||
> -	    !guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE))
> +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE))
>  		guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES);

Hi,

I have a question about this code, even before the patch was applied:

While it is obviously correct to disable XSAVES when XSAVE not supported, I wonder:
There are a lot more cases like that and KVM explicitly doesn't bother checking them,
e.g all of the AVX family also depends on XSAVE due to XCR0.

What makes XSAVES/XSAVE dependency special here? Maybe we can remove this code to be consistent?

AMD portion of this patch, on the other hand does makes sense, 
due to a lack of a separate XSAVES intercept.

Best regards,
	Maxim Levitsky

>  

>  	vmx_setup_uret_msrs(vmx);





^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 48/49] KVM: x86: Add a macro for features that are synthesized into boot_cpu_data
  2024-05-17 17:39 ` [PATCH v2 48/49] KVM: x86: Add a macro for features that are synthesized into boot_cpu_data Sean Christopherson
@ 2024-07-05  2:43   ` Maxim Levitsky
  2024-07-09 21:13     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-05  2:43 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Binbin Wu, Yang Weijiang, Robert Hoo

On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> Add yet another CPUID macro, this time for features that the host kernel
> synthesizes into boot_cpu_data, i.e. that the kernel force sets even in
> situations where the feature isn't reported by CPUID.  Thanks to the
> macro shenanigans of kvm_cpu_cap_init(), such features can now be handled
> in the core CPUID framework, i.e. don't need to be handled out-of-band and
> thus without as many guardrails.
> 
> Adding a dedicated macro also helps document what's going on, e.g. the
> calls to kvm_cpu_cap_check_and_set() are very confusing unless the reader
> knows exactly how kvm_cpu_cap_init() generates kvm_cpu_caps (and even
> then, it's far from obvious).
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/cpuid.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 0130e0677387..0e64a6332052 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -106,6 +106,17 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
>  	F(name);						\
>  })
>  
> +/*
> + * Synthesized Feature - For features that are synthesized into boot_cpu_data,
> + * i.e. may not be present in the raw CPUID, but can still be advertised to
> + * userspace.  Primarily used for mitigation related feature flags.
> + */
> +#define SYN_F(name)						\
> +({								\
> +	kvm_cpu_cap_synthesized |= F(name);			\
> +	F(name);						\
> +})
> +
>  /*
>   * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
>   * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
> @@ -727,13 +738,15 @@ do {									\
>  	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
>  	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
>  	u32 kvm_cpu_cap_emulated = 0;					\
> +	u32 kvm_cpu_cap_synthesized = 0;				\
>  									\
>  	if (leaf < NCAPINTS)						\
>  		kvm_cpu_caps[leaf] &= (mask);				\
>  	else								\
>  		kvm_cpu_caps[leaf] = (mask);				\
>  									\
> -	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);			\
> +	kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |			\
> +			       kvm_cpu_cap_synthesized);		\
>  	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
>  } while (0)
>  
> @@ -913,13 +926,10 @@ void kvm_set_cpu_caps(void)
>  	kvm_cpu_cap_init(CPUID_8000_0021_EAX,
>  		F(NO_NESTED_DATA_BP) | F(LFENCE_RDTSC) | 0 /* SmmPgCfgLock */ |
>  		F(NULL_SEL_CLR_BASE) | F(AUTOIBRS) | 0 /* PrefetchCtlMsr */ |
> -		F(WRMSR_XX_BASE_NS)
> +		F(WRMSR_XX_BASE_NS) | SYN_F(SBPB) | SYN_F(IBPB_BRTYPE) |
> +		SYN_F(SRSO_NO)
>  	);
>  
> -	kvm_cpu_cap_check_and_set(X86_FEATURE_SBPB);
> -	kvm_cpu_cap_check_and_set(X86_FEATURE_IBPB_BRTYPE);
> -	kvm_cpu_cap_check_and_set(X86_FEATURE_SRSO_NO);
> -
>  	kvm_cpu_cap_init(CPUID_8000_0022_EAX,
>  		F(PERFMON_V2)
>  	);


Hi,

Now that you added the final F_* macro, let's list all of them:


#define F(name)							\

/* Scattered Flag - For features that are scattered by cpufeatures.h. */
#define SF(name)						\



/* Features that KVM supports only on 64-bit kernels. */
#define X86_64_F(name)						\

/*
 * Raw Feature - For features that KVM supports based purely on raw host CPUID,
 * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
 * Simply force set the feature in KVM's capabilities, raw CPUID support will
 * be factored in by __kvm_cpu_cap_mask().
 */
#define RAW_F(name)						\

/*
 * Emulated Feature - For features that KVM emulates in software irrespective
 * of host CPU/kernel support.
 */
#define EMUL_F(name)						\

/*
 * Synthesized Feature - For features that are synthesized into boot_cpu_data,
 * i.e. may not be present in the raw CPUID, but can still be advertised to
 * userspace.  Primarily used for mitigation related feature flags.
 */
#define SYN_F(name)						\

/*
 * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
 * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
 */
#define AF(name)								\

/*
 * VMM Features - For features that KVM "supports" in some capacity, i.e. that
 * KVM may query, but that are never advertised to userspace.  E.g. KVM allows
 * userspace to enumerate MONITOR+MWAIT support to the guest, but the MWAIT
 * feature flag is never advertised to userspace because MONITOR+MWAIT aren't
 * virtualized by hardware, can't be faithfully emulated in software (KVM
 * emulates them as NOPs), and allowing the guest to execute them natively
 * requires enabling a per-VM capability.
 */
#define VMM_F(name)								\


Honestly, I already somewhat lost in what each of those macros means even when reading
the comments, which might indicate that a future reader might also have a 
hard time understanding those.

I now support even more the case of setting each feature bit in a separate statement
as I explained in an earlier patch.

What do you think?


Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 01/49] KVM: x86: Do all post-set CPUID processing during vCPU creation
  2024-07-05  0:48   ` Maxim Levitsky
@ 2024-07-08 18:46     ` Sean Christopherson
  2024-07-24 17:24       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-08 18:46 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> > index 23dbb9eb277c..0a8b561b5434 100644
> > --- a/arch/x86/kvm/cpuid.h
> > +++ b/arch/x86/kvm/cpuid.h
> > @@ -11,6 +11,7 @@
> >  extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
> >  void kvm_set_cpu_caps(void);
> >  
> > +void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
> >  void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
> >  void kvm_update_pv_runtime(struct kvm_vcpu *vcpu);
> >  struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index d750546ec934..7adcf56bd45d 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -12234,6 +12234,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> >  	kvm_xen_init_vcpu(vcpu);
> >  	kvm_vcpu_mtrr_init(vcpu);
> >  	vcpu_load(vcpu);
> > +	kvm_vcpu_after_set_cpuid(vcpu);
> 
> This makes me a bit nervous. At this point the vcpu->arch.cpuid_entries is
> NULL, but so is vcpu->arch.cpuid_nent so it sort of works but is one mistake
> away from crash.
>
> Maybe we should add some protection to this, e.g empty zero cpuid or
> something like that.

Hmm, a crash is actually a good thing.  In the post-KVM_SET_CPUID2 case, if KVM
accessed vcpu->arch.cpuid_entries without properly consulting cpuid_nent, the
resulting failure would be a out-of-bounds read.  Similarly, a zeroed CPUID array
would effectiely mask any bugs.

Given that KVM heavily relies on "vcpu" to be zero-allocated, and that changing
cpuid_nent during kvm_arch_vcpu_create() would be an extremely egregious bug,
a crash due to a NULL-pointer dereference should never escape developer testing,
let alone full release testing.

KVM does the "empty" array thing for IRQ routing (though in that case the array
and the nr_entries are in a single struct), and IMO it's been a huge net negative
because it's led to increased complexity just so that arch code can omit a NULL
check.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 05/49] KVM: selftests: Assert that the @cpuid passed to get_cpuid_entry() is non-NULL
  2024-07-05  0:58   ` Maxim Levitsky
@ 2024-07-08 19:33     ` Sean Christopherson
  2024-07-24 17:28       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-08 19:33 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > Add a sanity check in get_cpuid_entry() to provide a friendlier error than
> > a segfault when a test developer tries to use a vCPU CPUID helper on a
> > barebones vCPU.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  tools/testing/selftests/kvm/lib/x86_64/processor.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > index c664e446136b..f0f3434d767e 100644
> > --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > @@ -1141,6 +1141,8 @@ const struct kvm_cpuid_entry2 *get_cpuid_entry(const struct kvm_cpuid2 *cpuid,
> >  {
> >  	int i;
> >  
> > +	TEST_ASSERT(cpuid, "Must do vcpu_init_cpuid() first (or equivalent)");
> > +
> >  	for (i = 0; i < cpuid->nent; i++) {
> >  		if (cpuid->entries[i].function == function &&
> >  		    cpuid->entries[i].index == index)
> 
> Hi,
> 
> Maybe it is better to do this assert in __vcpu_get_cpuid_entry() because the
> assert might confuse the reader, since it just tests for NULL but when it
> fails, it complains that you need to call some function.

IIRC, I originally added the assert in __vcpu_get_cpuid_entry(), but I didn't
like leaving get_cpuid_entry() unprotected.  What if I add an assert in both?
E.g. have __vcpu_get_cpuid_entry() assert with the (hopefully) hepful message,
and have get_cpuid_entry() do a simple TEST_ASSERT_NE()?

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 07/49] KVM: selftests: Verify KVM stuffs runtime CPUID OS bits on CR4 writes
  2024-07-05  1:02   ` Maxim Levitsky
@ 2024-07-08 19:39     ` Sean Christopherson
  0 siblings, 0 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-07-08 19:39 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > Extend x86's set sregs test to verify that KVM sets/clears OSXSAVE and
> > OSKPKE according to CR4.XSAVE and CR4.PKE respectively.  For performance
> > reasons, KVM is responsible for emulating the architectural behavior of
> > the OS CPUID bits tracking CR4.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  tools/testing/selftests/kvm/x86_64/set_sregs_test.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
> > index 96fd690d479a..f4095a3d1278 100644
> > --- a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
> > +++ b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
> > @@ -85,6 +85,16 @@ static void test_cr_bits(struct kvm_vcpu *vcpu, uint64_t cr4)
> >  	rc = _vcpu_sregs_set(vcpu, &sregs);
> >  	TEST_ASSERT(!rc, "Failed to set supported CR4 bits (0x%lx)", cr4);
> >  
> > +	TEST_ASSERT(!!(sregs.cr4 & X86_CR4_OSXSAVE) ==
> > +		    (vcpu->cpuid && vcpu_cpuid_has(vcpu, X86_FEATURE_OSXSAVE)),
> > +		    "KVM didn't %s OSXSAVE in CPUID as expected",
> > +		    (sregs.cr4 & X86_CR4_OSXSAVE) ? "set" : "clear");
> > +
> > +	TEST_ASSERT(!!(sregs.cr4 & X86_CR4_PKE) ==
> > +		    (vcpu->cpuid && vcpu_cpuid_has(vcpu, X86_FEATURE_OSPKE)),
> > +		    "KVM didn't %s OSPKE in CPUID as expected",
> > +		    (sregs.cr4 & X86_CR4_PKE) ? "set" : "clear");
> > +
> 
> Hi,
> 
> Just for fun, why not to have a test function that toggles a CR4 bit and then
> checks the corresponding CPUID bit toggles as well? This is both better
> coverage wise and will remove the above code duplication.

Huh, I don't know.  I distinctly remember trying and failing to dedup this code,
but I don't think I ever tried actively toggling each bit.  I'll give that a shot.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 11/49] KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation
  2024-07-05  1:17   ` Maxim Levitsky
@ 2024-07-08 19:43     ` Sean Christopherson
  2024-07-24 17:31       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-08 19:43 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > Reject KVM_CAP_X86_DISABLE_EXITS if vCPUs have been created, as disabling
> > PAUSE/MWAIT/HLT exits after vCPUs have been created is broken and useless,
> > e.g. except for PAUSE on SVM, the relevant intercepts aren't updated after
> > vCPU creation.  vCPUs may also end up with an inconsistent configuration
> > if exits are disabled between creation of multiple vCPUs.
> 
> Hi,
> 
> I am not sure that PAUSE intercepts are updated either, I wasn't able to find a code
> that does this.
> 
> I agree with this change, but note that there was some talk on the mailing
> list to allow to selectively disable VM exits (e.g PAUSE, MWAIT, ...) only on
> some vCPUs, based on the claim that some vCPUs might run RT tasks, while some
> might be housekeeping.  I haven't followed those discussions closely.

This change is actually pulled from that series[*].  IIRC, v1 of that series
didn't close the VM-scoped hole, and the overall code was much more complex as
a result.

[*] https://lore.kernel.org/all/20230121020738.2973-2-kechenl@nvidia.com

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 10/49] KVM: x86: Drop now-redundant MAXPHYADDR and GPA rsvd bits from vCPU creation
  2024-07-05  1:13   ` Maxim Levitsky
@ 2024-07-08 19:53     ` Sean Christopherson
  2024-07-24 17:30       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-08 19:53 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > Drop the manual initialization of maxphyaddr and reserved_gpa_bits during
> > vCPU creation now that kvm_arch_vcpu_create() unconditionally invokes
> > kvm_vcpu_after_set_cpuid(), which handles all such CPUID caching.
> > 
> > None of the helpers between the existing code in kvm_arch_vcpu_create()
> > and the call to kvm_vcpu_after_set_cpuid() consume maxphyaddr or
> > reserved_gpa_bits (though auditing vmx_vcpu_create() and svm_vcpu_create()
> > isn't exactly easy).  And even if that weren't the case, KVM _must_
> > refresh any affected state during kvm_vcpu_after_set_cpuid(), e.g. to
> > correctly handle KVM_SET_CPUID2.  In other words, this can't introduce a
> > new bug, only expose an existing bug (of which there don't appear to be
> > any).
> 
> 
> IMHO the change is not as bulletproof as claimed:
> 
> If some code does access the uninitialized state (e.g vcpu->arch.maxphyaddr
> which will be zero, I assume), in between these calls, then even though later
> the correct CPUID will be set and should override the incorrect state set
> earlier, the problem *is* that the mentioned code will have to deal with non
> architecturally possible value (e.g maxphyaddr == 0) which might cause a bug
> in it.
>
> Of course such code currently doesn't exist, so it works but it can fail in
> the future.

Similar to not consuming a null cpuid_entries, any such future bug should never
escape developer testing since this is a very fixed sequence.  And practically
speaking, completely closing these holes isn't feasible because it's impossible
to initialize everything simultaneously, i.e. some amount of code will always
need to execute with zero-initialized vCPU state.

> How about we move the call to kvm_vcpu_after_set_cpuid upward?

A drop-in replacement was my preference too, but it doesn't work.  :-/
kvm_vcpu_after_set_cpuid() needs to be called after vcpu_load(), e.g. VMX's
hook will do VMWRITE.


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 19/49] KVM: x86: Add a macro to init CPUID features that ignore host kernel support
  2024-07-05  1:21   ` Maxim Levitsky
@ 2024-07-08 20:53     ` Sean Christopherson
  2024-07-24 17:39       ` Maxim Levitsky
  2024-07-08 22:36     ` Sean Christopherson
  1 sibling, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-08 20:53 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > Add a macro for use in kvm_set_cpu_caps() to automagically initialize
> > features that KVM wants to support based solely on the CPU's capabilities,
> > e.g. KVM advertises LA57 support if it's available in hardware, even if
> > the host kernel isn't utilizing 57-bit virtual addresses.
> > 
> > Take advantage of the fact that kvm_cpu_cap_mask() adjusts kvm_cpu_caps
> > based on raw CPUID, i.e. will clear features bits that aren't supported in
> > hardware, and simply force-set the capability before applying the mask.
> > 
> > Abusing kvm_cpu_cap_set() is a borderline evil shenanigan, but doing so
> > avoid extra CPUID lookups, and a future commit will harden the entire
> > family of *F() macros to assert (at compile time) that every feature being
> > allowed is part of the capability word being processed, i.e. using a macro
> > will bring more advantages in the future.
> 
> Could you explain what do you mean by "extra CPUID lookups"?

cpuid_ecx(7) incurs a CPUID to read the raw info, on top of the CPUID that is
executed by kvm_cpu_cap_init() (kvm_cpu_cap_mask() as of this patch).  Obviously
not a big deal, but it's an extra VM-Exit when running as a VM.

> > +/*
> > + * Raw Feature - For features that KVM supports based purely on raw host CPUID,
> > + * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
> > + * Simply force set the feature in KVM's capabilities, raw CPUID support will
> > + * be factored in by kvm_cpu_cap_mask().
> > + */
> > +#define RAW_F(name)						\
> > +({								\
> > +	kvm_cpu_cap_set(X86_FEATURE_##name);			\
> > +	F(name);						\
> > +})
> > +
> >  /*
> >   * Magic value used by KVM when querying userspace-provided CPUID entries and
> >   * doesn't care about the CPIUD index because the index of the function in
> > @@ -682,15 +694,12 @@ void kvm_set_cpu_caps(void)
> >  		F(AVX512VL));
> >  
> >  	kvm_cpu_cap_mask(CPUID_7_ECX,
> > -		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
> > +		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
> >  		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
> >  		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> >  		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
> >  		F(SGX_LC) | F(BUS_LOCK_DETECT)
> >  	);
> > -	/* Set LA57 based on hardware capability. */
> > -	if (cpuid_ecx(7) & F(LA57))
> > -		kvm_cpu_cap_set(X86_FEATURE_LA57);
> >  
> >  	/*
> >  	 * PKU not yet implemented for shadow paging and requires OSPKE
> 
> Putting a function call into a macro which evaluates into a bitmask is
> somewhat misleading, but let it be...
> 
> IMHO in long term, it might be better to rip the whole huge 'or'ed mess, and replace
> it with a list of statements, along with comments for all unusual cases.

As in something like this?

	kvm_cpu_cap_init(AVX512VBMI);
	kvm_cpu_cap_init_raw(LA57);
	kvm_cpu_cap_init(PKU);
	...
	kvm_cpu_cap_init(BUS_LOCK_DETECT);

	kvm_cpu_cap_init_aliased(CPUID_8000_0001_EDX, FPU);

	...

	kvm_cpu_cap_init_scattered(CPUID_12_EAX, SGX1);
	kvm_cpu_cap_init_scattered(CPUID_12_EAX, SGX2);
	kvm_cpu_cap_init_scattered(CPUID_12_EAX, SGX_EDECCSSA);

The tricky parts are incorporating raw CPUID into the masking and handling features
that KVM _doesn't_ support.  For raw CPUID, we could simply do CPUID every time, or
pre-fill an array to avoid hundreds of CPUIDs that are largely redudant.

But I don't see a way to mask off unsupported features without losing the
compile-time protections that the current code provides.  And even if we took a
big hammer approach, e.g. finalized masking for all words at the very end, we'd
still need to carry state across each statement, i.e. we'd still need the bitwise-OR
and mask  behavior, it would just be buried in helpers/macros.

I suspect the generated code will be larger, but I doubt that will actually be
problematic.  The written code will also be more verbose (roughly 4x since we
tend to squeeze 4 features per line), and it will be harder to ensure initialization
of features in a given word are all co-located.

I definitely don't hate the idea, but I don't think it will be a clear "win" either.
Unless someone feels strongly about pursuing this approach, I'll add to the "things
to explore later" list.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features
  2024-07-05  1:25   ` Maxim Levitsky
@ 2024-07-08 21:08     ` Sean Christopherson
  2024-07-24 17:46       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-08 21:08 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > Add a macro to precisely handle CPUID features that AMD duplicated from
> > CPUID.0x1.EDX into CPUID.0x8000_0001.EDX.  This will allow adding an
> > assert that all features passed to kvm_cpu_cap_init() match the word being
> > processed, e.g. to prevent passing a feature from CPUID 0x7 to CPUID 0x1.
> > 
> > Because the kernel simply reuses the X86_FEATURE_* definitions from
> > CPUID.0x1.EDX, KVM's use of the aliased features would result in false
> > positives from such an assert.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/cpuid.c | 24 +++++++++++++++++-------
> >  1 file changed, 17 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 5e3b97d06374..f2bd2f5c4ea3 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -88,6 +88,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
> >  	F(name);						\
> >  })
> >  
> > +/*
> > + * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
> > + * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
> > + */
> > +#define AF(name)								\
> > +({										\
> > +	BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);	\
> > +	feature_bit(name);							\
> > +})
> > +
> >  /*
> >   * Magic value used by KVM when querying userspace-provided CPUID entries and
> >   * doesn't care about the CPIUD index because the index of the function in
> > @@ -758,13 +768,13 @@ void kvm_set_cpu_caps(void)
> >  	);
> >  
> >  	kvm_cpu_cap_init(CPUID_8000_0001_EDX,
> > -		F(FPU) | F(VME) | F(DE) | F(PSE) |
> > -		F(TSC) | F(MSR) | F(PAE) | F(MCE) |
> > -		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
> > -		F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
> > -		F(PAT) | F(PSE36) | 0 /* Reserved */ |
> > -		F(NX) | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
> > -		F(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
> > +		AF(FPU) | AF(VME) | AF(DE) | AF(PSE) |
> > +		AF(TSC) | AF(MSR) | AF(PAE) | AF(MCE) |
> > +		AF(CX8) | AF(APIC) | 0 /* Reserved */ | F(SYSCALL) |
> > +		AF(MTRR) | AF(PGE) | AF(MCA) | AF(CMOV) |
> > +		AF(PAT) | AF(PSE36) | 0 /* Reserved */ |
> > +		F(NX) | 0 /* Reserved */ | F(MMXEXT) | AF(MMX) |
> > +		AF(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
> >  		0 /* Reserved */ | X86_64_F(LM) | F(3DNOWEXT) | F(3DNOW)
> >  	);
> >  
> 
> Hi,
> 
> What if we defined the aliased features instead.
> Something like this:
> 
> #define __X86_FEATURE_8000_0001_ALIAS(feature) \
> 	(feature + (CPUID_8000_0001_EDX - CPUID_1_EDX) * 32)
> 
> #define KVM_X86_FEATURE_FPU_ALIAS	__X86_FEATURE_8000_0001_ALIAS(KVM_X86_FEATURE_FPU)
> #define KVM_X86_FEATURE_VME_ALIAS	__X86_FEATURE_8000_0001_ALIAS(KVM_X86_FEATURE_VME)
> 
> And then just use for example the 'F(FPU_ALIAS)' in the CPUID_8000_0001_EDX

At first glance, I really liked this idea, but after working through the
ramifications, I think I prefer "converting" the flag when passing it to
kvm_cpu_cap_init().  In-place conversion makes it all but impossible for KVM to
check the alias, e.g. via guest_cpu_cap_has(), especially since the AF() macro
doesn't set the bits in kvm_known_cpu_caps (if/when a non-hacky validation of
usage becomes reality).

Side topic, if it's not already documented somewhere else, kvm/x86/cpuid.rst
should call out that KVM only honors the features in CPUID.0x1, i.e. that setting
aliased bits in CPUID.0x8000_0001 is supported if and only if the bit(s) is also
set in CPUID.0x1.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 23/49] KVM: x86: Handle kernel- and KVM-defined CPUID words in a single helper
  2024-07-05  1:28   ` Maxim Levitsky
@ 2024-07-08 21:18     ` Sean Christopherson
  2024-07-17 14:00       ` Xiaoyao Li
  2024-07-24 17:51       ` Maxim Levitsky
  0 siblings, 2 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-07-08 21:18 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > Merge kvm_cpu_cap_init() and kvm_cpu_cap_init_kvm_defined() into a single
> > helper.  The only advantage of separating the two was to make it somewhat
> > obvious that KVM directly initializes the KVM-defined words, whereas using
> > a common helper will allow for hardening both kernel- and KVM-defined
> > CPUID words without needing copy+paste.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/cpuid.c | 44 +++++++++++++++-----------------------------
> >  1 file changed, 15 insertions(+), 29 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index f2bd2f5c4ea3..8efffd48cdf1 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -622,37 +622,23 @@ static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid)
> >  	return *__cpuid_entry_get_reg(&entry, cpuid.reg);
> >  }
> >  
> > -/* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */
> > -static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
> > +static __always_inline void kvm_cpu_cap_init(u32 leaf, u32 mask)
> >  {
> >  	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
> >  
> > -	reverse_cpuid_check(leaf);
> > +	/*
> > +	 * For kernel-defined leafs, mask the boot CPU's pre-populated value.
> > +	 * For KVM-defined leafs, explicitly set the leaf, as KVM is the one
> > +	 * and only authority.
> > +	 */
> > +	if (leaf < NCAPINTS)
> > +		kvm_cpu_caps[leaf] &= mask;
> > +	else
> > +		kvm_cpu_caps[leaf] = mask;
> 
> Hi,
> 
> I have an idea, how about we just initialize the kvm only leafs to 0xFFFFFFFF
> and then treat them exactly in the same way as kernel regular leafs?
> 
> Then the user won't have to figure out (assuming that the user doesn't read
> the comment, who does?) why we use mask as init value.
> 
> But if you prefer to leave it this way, I won't object either.

Huh, hadn't thought of that.  It's a small code change, but I'm leaning towards
keeping the current code as we'd still need a comment to explain why KVM sets
all bits by default.  And in the unlikely case that we royally screw up and fail
to call kvm_cpu_cap_init() on a word, starting with 0xff would result in all
features in the uninitialized word being treated as supported.

For posterity...

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 18ded0e682f2..6fcfb0fa4bd6 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -762,11 +762,7 @@ do {                                                                       \
        u32 kvm_cpu_cap_emulated = 0;                                   \
        u32 kvm_cpu_cap_synthesized = 0;                                \
                                                                        \
-       if (leaf < NCAPINTS)                                            \
-               kvm_cpu_caps[leaf] &= (mask);                           \
-       else                                                            \
-               kvm_cpu_caps[leaf] = (mask);                            \
-                                                                       \
+       kvm_cpu_caps[leaf] &= (mask);                                   \
        kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |                   \
                               kvm_cpu_cap_synthesized);                \
        kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;                     \
@@ -780,7 +776,7 @@ do {                                                                        \
 
 void kvm_set_cpu_caps(void)
 {
-       memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));
+       memset(kvm_cpu_caps, 0xff, sizeof(kvm_cpu_caps));
 
        BUILD_BUG_ON(sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)) >
                     sizeof(boot_cpu_data.x86_capability));

^ permalink raw reply related	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 24/49] KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions
  2024-07-05  1:30   ` Maxim Levitsky
@ 2024-07-08 21:29     ` Sean Christopherson
  2024-07-24 17:54       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-08 21:29 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > Undefine SPEC_CTRL_SSBD, which is #defined by msr-index.h to represent the
> > enable flag in MSR_IA32_SPEC_CTRL, to avoid issues with the macro being
> > unpacked into its raw value when passed to KVM's F() macro.  This will
> > allow using multiple layers of macros in F() and friends, e.g. to harden
> > against incorrect usage of F().
> > 
> > No functional change intended (cpuid.c doesn't consume SPEC_CTRL_SSBD).
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/cpuid.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 8efffd48cdf1..a16d6e070c11 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -639,6 +639,12 @@ static __always_inline void kvm_cpu_cap_init(u32 leaf, u32 mask)
> >  	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
> >  }
> >  
> > +/*
> > + * Undefine the MSR bit macro to avoid token concatenation issues when
> > + * processing X86_FEATURE_SPEC_CTRL_SSBD.
> > + */
> > +#undef SPEC_CTRL_SSBD
> > +
> >  void kvm_set_cpu_caps(void)
> >  {
> >  	memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));
> 
> Hi,
> 
> Maybe we should instead rename the SPEC_CTRL_SSBD to
> 'MSR_IA32_SPEC_CTRL_SSBD' and together with it, other fields of this msr.  It
> seems that at least some msrs in this file do this.

Yeah, the #undef hack is quite ugly.  But I didn't (and still don't) want to
introduce all the renaming churn in the middle of this already too-big series,
especially since it would require touching quite a bit of code outside of KVM.

I'm also not sure that's the right thing to do; I kinda feel like KVM is the one
that's being silly here.

Aha!  Rather than rename the MSR bits, what if we rename the X86_FEATURE flag,
e.g. to X86_FEATURE_INTEL_SPEC_CTRL_SSBD, X86_FEATURE_MSR_SPEC_CTRL_SSBD, or maybe
even just X86_FEATURE_INTEL_SSBD.  Much less churn, and it would add even more
clarity as to why there's also X86_FEATURE_SSBD and X86_FEATURE_AMD_SSBD.

I'll post a standalone patch to make that change, and maybe see if I can take it
through the KVM tree.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 27/49] KVM: x86: Swap incoming guest CPUID into vCPU before massaging in KVM_SET_CPUID2
  2024-07-05  1:32   ` Maxim Levitsky
@ 2024-07-08 21:37     ` Sean Christopherson
  0 siblings, 0 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-07-08 21:37 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > @@ -529,7 +533,14 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
> >  #endif
> >  	kvm_vcpu_after_set_cpuid(vcpu);
> >  
> > +success:
> > +	kvfree(e2);
> >  	return 0;
> > +
> > +err:
> > +	swap(vcpu->arch.cpuid_entries, e2);
> > +	swap(vcpu->arch.cpuid_nent, nent);
> > +	return r;
> >  }
> >  
> >  /* when an old userspace process fills a new kernel module */
> 
> Hi,
> 
> This IMHO is a good idea. You might consider moving this patch to the
> beginning of the patch series though, it will make more sense with the rest
> of the patches there.

I'll double check, but IIRC, there were dependencies that prevented moving this
patch earlier.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software
  2024-07-05  1:59   ` Maxim Levitsky
@ 2024-07-08 22:30     ` Sean Christopherson
  2024-07-24 17:58       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-08 22:30 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > Now that kvm_cpu_cap_init() is a macro with its own scope, add EMUL_F() to
> > OR-in features that KVM emulates in software, i.e. that don't depend on
> > the feature being available in hardware.  The contained scope
> > of kvm_cpu_cap_init() allows using a local variable to track the set of
> > emulated leaves, which in addition to avoiding confusing and/or
> > unnecessary variables, helps prevent misuse of EMUL_F().
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/cpuid.c | 36 +++++++++++++++++++++---------------
> >  1 file changed, 21 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 1064e4d68718..33e3e77de1b7 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -94,6 +94,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
> >  	F(name);						\
> >  })
> >  
> > +/*
> > + * Emulated Feature - For features that KVM emulates in software irrespective
> > + * of host CPU/kernel support.
> > + */
> > +#define EMUL_F(name)						\
> > +({								\
> > +	kvm_cpu_cap_emulated |= F(name);			\
> > +	F(name);						\
> > +})
> 
> To me it feels more and more that this patch series doesn't go into the right
> direction.
> 
> How about we just abandon the whole concept of masks and instead just have a
> list of statements
> 
> Pretty much the opposite of the patch series I confess:

FWIW, I think it's actually largely the same code under the hood.  The code for
each concept/flavor ends up being very similar, it's mostly just handling the
bitwise-OR in the callers vs. in the helpers.

> #define CAP_PASSTHOUGH		0x01
> #define CAP_EMULATED		0x02
> #define CAP_AMD_ALIASED		0x04 // for AMD aliased features
> #define CAP_SCATTERED		0x08
> #define CAP_X86_64		0x10 // supported only on 64 bit hypervisors
> ...
> 
> 
> /* CPUID_1_ECX*/
> 
> 				/* TMA is not passed though because: xyz*/
> kvm_cpu_cap_init(TMA,           0);
> 
> kvm_cpu_cap_init(SSSE3,         CAP_PASSTHOUGH);
> 				/* CNXT_ID is not passed though because: xyz*/
> kvm_cpu_cap_init(CNXT_ID,       0);
> kvm_cpu_cap_init(RESERVED,      0);
> kvm_cpu_cap_init(FMA,           CAP_PASSTHOUGH);
> ...
> 				/* KVM always emulates TSC_ADJUST */
> kvm_cpu_cap_init(TSC_ADJUST,    CAP_EMULATED | CAP_SCATTERED);
> 
> ...
> 
> /* CPUID_D_1_EAX*/
> 				/* XFD is disabled on 32 bit systems because: xyz*/
> kvm_cpu_cap_init(XFD, 		CAP_PASSTHOUGH | CAP_X86_64)
> 
> 
> 'kvm_cpu_cap_init' can be a macro if needed to have the compile checks.
> 
> There are several advantages to this:
> 
> - more readability, plus if needed each statement can be amended with a comment.
> - No weird hacks in 'F*' macros, which additionally eventually evaluate into a bit,
>   which is confusing.
>   In fact no need to even have them at all.
> - No need to verify that bitmask belongs to a feature word.

Yes, but the downside is that there is no enforcement of features in a word being
bundled together.

> - Merge friendly - each capability has its own line.

That's almost entirely convention though.  Other than inertia, nothing is stopping
us from doing:

	kvm_cpu_cap_init(CPUID_12_EAX,
		SF(SGX1) |
		SF(SGX2) |
		SF(SGX_EDECCSSA)
	);

I don't see a clean way of avoiding the addition of " |" on the last existing
line, but in practice I highly doubt that will ever be a source of meaningful pain.

Same goes for the point about adding comments.  We could do that with either
approach, we just don't do so today.

> Disadvantages:
> 
> - Longer list - IMHO not a problem, since it is very easy to read / search
>   and can have as much comments as needed.
>   For example this is how the kernel lists the CPUID features and this list IMHO
>   is very manageable.

There's one big difference: KVM would need to have a line for every feature that
KVM _doesn't_ support.  For densely populated words, that's not a huge issue,
but it's problematic for sparsely populated words, e.g. CPUID_12_EAX would have
29 reserved/unsupport entries, which IMO ends up being a big net negative for
code readability and ongoing maintenance.

We could avoid that cost (and the danger of a missed bit) by collecting the set
of features that have been initialized for each word, and then masking off the
uninitialized/unsupported at the end.  But then we're back to the bitwise-OR and
mask logic.

And while I agree that having the F*() macros set state _and_ evaulate to a bit
is imperfect, it does have its advantages.  E.g. to avoid evaluating to a value,
we could have F() modify a local variable that is scoped to kvm_cpu_cap_init(),
a las kvm_cpu_cap_emulated.  But then we'd need explicit code and/or comments
to call out that VMM_F() and the like intentionally don't set kvm_cpu_cap_supported,
whereas evualating to a value is a relatively self-documenting "0;".

> - Slower - kvm_set_cpu_caps is called exactly once per KVM module load, thus
>   performance is the last thing I would care about in this function.
> 
> Another note about this patch: It is somewhat confusing that EMUL_F just
> forces a feature in kvm caps, regardless of CPU support, because KVM also has
> KVM_GET_EMULATED_CPUID and it has a different meaning.

Yeah, but IMO that's a problem with KVM_GET_EMULATED_CPUID being poorly defined.

> Users can easily confuse the EMUL_F for something that sets a feature bit in
> the KVM_GET_EMULATED_CPUID.

I'll see if I can find a good spot for a comment to try and convenient

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 19/49] KVM: x86: Add a macro to init CPUID features that ignore host kernel support
  2024-07-05  1:21   ` Maxim Levitsky
  2024-07-08 20:53     ` Sean Christopherson
@ 2024-07-08 22:36     ` Sean Christopherson
  2024-07-24 17:40       ` Maxim Levitsky
  1 sibling, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-08 22:36 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > +/*
> > + * Raw Feature - For features that KVM supports based purely on raw host CPUID,
> > + * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
> > + * Simply force set the feature in KVM's capabilities, raw CPUID support will
> > + * be factored in by kvm_cpu_cap_mask().
> > + */
> > +#define RAW_F(name)						\
> > +({								\
> > +	kvm_cpu_cap_set(X86_FEATURE_##name);			\
> > +	F(name);						\
> > +})
> > +
> >  /*
> >   * Magic value used by KVM when querying userspace-provided CPUID entries and
> >   * doesn't care about the CPIUD index because the index of the function in
> > @@ -682,15 +694,12 @@ void kvm_set_cpu_caps(void)
> >  		F(AVX512VL));
> >  
> >  	kvm_cpu_cap_mask(CPUID_7_ECX,
> > -		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
> > +		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
> >  		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
> >  		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> >  		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
> >  		F(SGX_LC) | F(BUS_LOCK_DETECT)
> >  	);
> > -	/* Set LA57 based on hardware capability. */
> > -	if (cpuid_ecx(7) & F(LA57))
> > -		kvm_cpu_cap_set(X86_FEATURE_LA57);
> >  
> >  	/*
> >  	 * PKU not yet implemented for shadow paging and requires OSPKE
> 
> Putting a function call into a macro which evaluates into a bitmask is somewhat misleading,
> but let it be...

And weird.  Rather than abuse kvm_cpu_cap_set(), what about adding another variable
scoped to kvm_cpu_cap_init()?

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0e64a6332052..b8bc8713a0ec 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -87,12 +87,10 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
 /*
  * Raw Feature - For features that KVM supports based purely on raw host CPUID,
  * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
- * Simply force set the feature in KVM's capabilities, raw CPUID support will
- * be factored in by __kvm_cpu_cap_mask().
  */
 #define RAW_F(name)                                            \
 ({                                                             \
-       kvm_cpu_cap_set(X86_FEATURE_##name);                    \
+       kvm_cpu_cap_passthrough |= F(name);                     \
        F(name);                                                \
 })
 
@@ -737,6 +735,7 @@ do {                                                                        \
 do {                                                                   \
        const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);    \
        const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;   \
+       u32 kvm_cpu_cap_passthrough = 0;                                \
        u32 kvm_cpu_cap_emulated = 0;                                   \
        u32 kvm_cpu_cap_synthesized = 0;                                \
                                                                        \
@@ -745,6 +744,7 @@ do {                                                                        \
        else                                                            \
                kvm_cpu_caps[leaf] = (mask);                            \
                                                                        \
+       kvm_cpu_caps[leaf] |= kvm_cpu_cap_passthrough;                  \
        kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |                   \
                               kvm_cpu_cap_synthesized);                \
        kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;                     \

^ permalink raw reply related	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 40/49] KVM: x86: Initialize guest cpu_caps based on KVM support
  2024-07-05  2:22   ` Maxim Levitsky
@ 2024-07-09  0:10     ` Sean Christopherson
  2024-07-24 18:01       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-09  0:10 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > @@ -421,6 +423,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> >  	 */
> >  	for (i = 0; i < NR_KVM_CPU_CAPS; i++) {
> >  		const struct cpuid_reg cpuid = reverse_cpuid[i];
> > +		struct kvm_cpuid_entry2 emulated;
> >  
> >  		if (!cpuid.function)
> >  			continue;
> > @@ -429,7 +432,16 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> >  		if (!entry)
> >  			continue;
> >  
> > -		vcpu->arch.cpu_caps[i] = cpuid_get_reg_unsafe(entry, cpuid.reg);
> > +		cpuid_func_emulated(&emulated, cpuid.function);
> > +
> > +		/*
> > +		 * A vCPU has a feature if it's supported by KVM and is enabled
> > +		 * in guest CPUID.  Note, this includes features that are
> > +		 * supported by KVM but aren't advertised to userspace!
> > +		 */
> > +		vcpu->arch.cpu_caps[i] = kvm_cpu_caps[i] | kvm_vmm_cpu_caps[i] |
> > +					 cpuid_get_reg_unsafe(&emulated, cpuid.reg);
> > +		vcpu->arch.cpu_caps[i] &= cpuid_get_reg_unsafe(entry, cpuid.reg);
> 
> Hi,
> 
> I have an idea. What if we get rid of kvm_vmm_cpu_caps, and instead advertise the
> MWAIT in KVM_GET_EMULATED_CPUID?
> 
> MWAIT is sort of emulated as NOP after all, plus features in KVM_GET_EMULATED_CPUID are
> sort of 'emulated inefficiently' and you can say that NOP is an inefficient emulation
> of MWAIT sort of.

Heh, sort of indeed.  I really don't want to advertise MWAIT to userspace in any
capacity beyond KVM_CAP_X86_DISABLE_EXITS, because advertising MWAIT to VMs when
MONITOR/MWAIT exiting is enabled is actively harmful, to both host and guest.

KVM also doesn't emulate them on #UD, unlike MOVBE, which would make the API even
more confusing than it already is.

> It just feels to me that kvm_vmm_cpu_caps, is somewhat an overkill, and its name is
> somewhat confusing.

Yeah, I don't love it either, but trying to handle MWAIT as a one-off was even
uglier.  One option would be to piggyback cpuid_func_emulated(), but add a param
to have it fill MWAIT only for KVM's internal purposes.  That'd essentially be
the same as a one-off in kvm_vcpu_after_set_cpuid(), but less ugly.

I'd say it comes down to whether or not we expect to have more features that KVM
"supports", but doesn't advertise to userspace.  If we do, then I think adding
VMM_F() is the way to go.  If we expect MWAIT to be the only feature that gets
this treatment, then I'm ok if we bastardize cpuid_func_emulated().

And I think/hope that MWAIT will be a one-off.  Emulating it as a nop was a
mistake and has since been quirked, and I like to think we (eventually) learn
from our mistakes.

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0e64a6332052..dbc3f6ce9203 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -448,7 +448,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
                if (!entry)
                        continue;
 
-               cpuid_func_emulated(&emulated, cpuid.function);
+               cpuid_func_emulated(&emulated, cpuid.function, false);
 
                /*
                 * A vCPU has a feature if it's supported by KVM and is enabled
@@ -1034,7 +1034,8 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
        return entry;
 }
 
-static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func)
+static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func,
+                              bool only_advertised)
 {
        memset(entry, 0, sizeof(*entry));
 
@@ -1048,6 +1049,9 @@ static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func)
                return 1;
        case 1:
                entry->ecx = F(MOVBE);
+               /* comment goes here. */
+               if (!only_advertised)
+                       entry->ecx |= F(MWAIT);
                return 1;
        case 7:
                entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
@@ -1065,7 +1069,7 @@ static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
        if (array->nent >= array->maxnent)
                return -E2BIG;
 
-       array->nent += cpuid_func_emulated(&array->entries[array->nent], func);
+       array->nent += cpuid_func_emulated(&array->entries[array->nent], func, true);
        return 0;
 }

^ permalink raw reply related	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 39/49] KVM: x86: Extract code for generating per-entry emulated CPUID information
  2024-07-05  2:18   ` Maxim Levitsky
@ 2024-07-09  0:13     ` Sean Christopherson
  2024-07-24 18:00       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-09  0:13 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> PS: I spoke with Paolo about the meaning of KVM_GET_EMULATED_CPUID, because
> it is not clear from the documentation what it does, or what it supposed to
> do because qemu doesn't use this IOCTL.
> 
> So this ioctl is meant to return a static list of CPU features which *can* be
> emulated by KVM, if the cpu doesn't support them, but there is a cost to it,
> so they should not be enabled by default.
> 
> This means that if you run 'qemu -cpu host', these features (like rdpid) will
> only be enabled if supported by the host cpu, however if you explicitly ask
> qemu for such a feature, like 'qemu -cpu host,+rdpid', qemu should not warn
> if the feature is not supported on host cpu but can be emulated (because kvm
> can emulate the feature, which is stated by KVM_GET_EMULATED_CPUID ioctl).
> 
> Qemu currently doesn't support this but the support can be added.
> 
> So I think that the two ioctls should be redefined as such:
> 
> KVM_GET_SUPPORTED_CPUID - returns all CPU features that are supported by KVM,
> supported by host hardware, or that KVM can efficiently emulate.
> 
> 
> KVM_GET_EMULATED_CPUID - returns all CPU features that KVM *can* emulate if
> the host cpu lacks support, but emulation is not efficient and thus these
> features should be used with care when not supported by the host (e.g only
> when the user explicitly asks for them).

Yep, that aligns with how I view the ioctls (I haven't read the documentaion,
mainly because I have a terrible habit of never reading docs).

> I can post a patch to fix this or you can add something like that to your
> patch series if you prefer.

Go ahead and post a patch, assuming it's just a documentation update.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 44/49] KVM: x86: Update guest cpu_caps at runtime for dynamic CPUID-based features
  2024-07-05  2:26   ` Maxim Levitsky
@ 2024-07-09  0:24     ` Sean Christopherson
  2024-09-10 20:41       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-09  0:24 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > -		cpuid_entry_change(best, X86_FEATURE_OSPKE,
> > -				   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
> > +		kvm_update_feature_runtime(vcpu, best, X86_FEATURE_OSPKE,
> > +					   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
> > +
> >  
> >  	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 0);
> >  	if (best)
> 
> 
> I am not 100% sure that we need to do this.
> 
> Runtime cpuid changes are a hack that Intel did back then, due to various
> reasons, These changes don't really change the feature set that CPU supports,
> but merly as you like to say 'massage' the output of the CPUID instruction to
> make the unmodified OS happy usually.
> 
> Thus it feels to me that CPU caps should not include the dynamic features,
> and neither KVM should use the value of these as a source for truth, but
> rather the underlying source of the truth (e.g CR4).
> 
> But if you insist, I don't really have a very strong reason to object this.

FWIW, I think I agree that CR4 should be the source of truth, but it's largely a
moot point because KVM doesn't actually check OSXSAVE or OSPKE, as KVM never
emulates the relevant instructions.  So for those, it's indeed not strictly
necessary.

Unfortunately, KVM has established ABI for checking X86_FEATURE_MWAIT when
"emulating" MONITOR and MWAIT, i.e. KVM can't use vcpu->arch.ia32_misc_enable_msr
as the source of truth.  So for MWAIT, KVM does need to update CPU caps (or carry
even more awful MWAIT code), at which point extending the behavior to the CR4
features (and to X86_FEATURE_APIC) is practically free.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 25/49] KVM: x86: Harden CPU capabilities processing against out-of-scope features
  2024-07-05  1:31   ` Maxim Levitsky
@ 2024-07-09 18:11     ` Sean Christopherson
  2024-07-24 17:55       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-09 18:11 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > +/*
> > + * For kernel-defined leafs, mask the boot CPU's pre-populated value.  For KVM-
> > + * defined leafs, explicitly set the leaf, as KVM is the one and only authority.
> > + */
> > +#define kvm_cpu_cap_init(leaf, mask)					\
> > +do {									\
> > +	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
> > +	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
> 
> Why not to #define the kvm_cpu_cap_init_in_progress as well instead of a variable?

Macros can't #define new macros.  A macro could be used, but it would require the
caller to #define and #undef the macro, e.g.

	#define kvm_cpu_cap_init_in_progress CPUID_1_ECX
	kvm_cpu_cap_init(CPUID_1_ECX, ...)
	#undef kvm_cpu_cap_init_in_progress

but, stating the obvious, that's ugly and is less robust than automatically
"defining" the in-progress leaf in kvm_cpu_cap_init().

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 37/49] KVM: x86: Replace guts of "governed" features with comprehensive cpu_caps
  2024-07-05  2:10   ` Maxim Levitsky
@ 2024-07-09 18:30     ` Sean Christopherson
  2024-07-24 18:00       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-09 18:30 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > @@ -861,23 +877,20 @@ struct kvm_vcpu_arch {
> >  	bool is_amd_compatible;
> >  
> >  	/*
> > -	 * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly
> > -	 * when "struct kvm_vcpu_arch" is no longer defined in an
> > -	 * arch/x86/include/asm header.  The max is mostly arbitrary, i.e.
> > -	 * can be increased as necessary.
> > +	 * cpu_caps holds the effective guest capabilities, i.e. the features
> > +	 * the vCPU is allowed to use.  Typically, but not always, features can
> > +	 * be used by the guest if and only if both KVM and userspace want to
> > +	 * expose the feature to the guest.
> 
> Nitpick: Since even the comment mentions this, wouldn't it be better to call this
> cpu_effective_caps? or at least cpu_eff_caps, to emphasize that these are indeed
> effective capabilities, e.g these that both kvm and userspace support?

I strongly prefer cpu_caps, in part to match kvm_cpu_caps, but also because adding
"effective" to the name incorrectly suggests that there are other guest capabilities
that aren't effective.  These are the _only_ per-vCPU capabilities as far as KVM
is concerned, i.e. they are the single source of truth.  kvm_cpu_caps holds KVM's
capabilities, boot_cpu_data holds kernel capabilities, and bare metal holds its
capabilities somewhere in silicion.

E.g. being pedantic, kvm_cpu_caps are also KVM's effective capabilities, as they
are a reflection of KVM-the-module's capabilities, module params, kernel capabilities,
and CPU capabilities.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 29/49] KVM: x86: Remove unnecessary caching of KVM's PV CPUID base
  2024-07-05  1:51   ` Maxim Levitsky
@ 2024-07-09 19:00     ` Sean Christopherson
  2024-07-24 17:59       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-09 19:00 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > Now that KVM only searches for KVM's PV CPUID base when userspace sets
> > guest CPUID, drop the cache and simply do the search every time.
> > 
> > Practically speaking, this is a nop except for situations where userspace
> > sets CPUID _after_ running the vCPU, which is anything but a hot path,
> > e.g. QEMU does so only when hotplugging a vCPU.  And on the flip side,
> > caching guest CPUID information, especially information that is used to
> > query/modify _other_ CPUID state, is inherently dangerous as it's all too
> > easy to use stale information, i.e. KVM should only cache CPUID state when
> > the performance and/or programming benefits justify it.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---

...

> > @@ -491,13 +479,6 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
> >  	 * whether the supplied CPUID data is equal to what's already set.
> >  	 */
> >  	if (kvm_vcpu_has_run(vcpu)) {
> > -		/*
> > -		 * Note, runtime CPUID updates may consume other CPUID-driven
> > -		 * vCPU state, e.g. KVM or Xen CPUID bases.  Updating runtime
> > -		 * state before full CPUID processing is functionally correct
> > -		 * only because any change in CPUID is disallowed, i.e. using
> > -		 * stale data is ok because KVM will reject the change.
> > -		 */
> Hi,
> 
> Any reason why this comment was removed?

Because after this patch, runtime CPUID updates no longer consume other vCPU
state that is derived from guest CPUID.

> As I said earlier in the review.  It might make sense to replace this comment
> with a comment reflecting on why we need to call kvm_update_cpuid_runtime,
> that is solely to allow old == new compare to succeed.

Ya, I'll figure out a location and patch to document why KVM applies runtime
and quirks to the CPUID before checking.

> 
> >  		kvm_update_cpuid_runtime(vcpu);
> >  		kvm_apply_cpuid_pv_features_quirk(vcpu);

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 47/49] KVM: x86: Drop superfluous host XSAVE check when adjusting guest XSAVES caps
  2024-07-05  2:36   ` Maxim Levitsky
@ 2024-07-09 19:15     ` Sean Christopherson
  2024-07-24 18:02       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-09 19:15 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > Drop the manual boot_cpu_has() checks on XSAVE when adjusting the guest's
> > XSAVES capabilities now that guest cpu_caps incorporates KVM's support.
> > The guest's cpu_caps are initialized from kvm_cpu_caps, which are in turn
> > initialized from boot_cpu_data, i.e. checking guest_cpu_cap_has() also
> > checks host/KVM capabilities (which is the entire point of cpu_caps).
> > 
> > Cc: Maxim Levitsky <mlevitsk@redhat.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  arch/x86/kvm/svm/svm.c | 1 -
> >  arch/x86/kvm/vmx/vmx.c | 3 +--
> >  2 files changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 06770b60c0ba..4aaffbf22531 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -4340,7 +4340,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> >  	 * the guest read/write access to the host's XSS.
> >  	 */
> >  	guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES,
> > -			     boot_cpu_has(X86_FEATURE_XSAVE) &&
> >  			     boot_cpu_has(X86_FEATURE_XSAVES) &&
> >  			     guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE));
> 
> >  
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 741961a1edcc..6fbdf520c58b 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -7833,8 +7833,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> >  	 * to the guest.  XSAVES depends on CR4.OSXSAVE, and CR4.OSXSAVE can be
> >  	 * set if and only if XSAVE is supported.
> >  	 */
> 
> 
> > -	if (!boot_cpu_has(X86_FEATURE_XSAVE) ||
> > -	    !guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE))
> > +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE))
> >  		guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES);
> 
> Hi,
> 
> I have a question about this code, even before the patch was applied:
> 
> While it is obviously correct to disable XSAVES when XSAVE not supported, I
> wonder: There are a lot more cases like that and KVM explicitly doesn't
> bother checking them, e.g all of the AVX family also depends on XSAVE due to
> XCR0.
> 
> What makes XSAVES/XSAVE dependency special here? Maybe we can remove this
> code to be consistent?

Because that would result in VMX and SVM behavior diverging with respect to
whether guest_cpu_cap_has(X86_FEATURE_XSAVES).  E.g. for AMD it would be 100%
accurate, but for Intel it would be accurate if and only if XSAVE is supported.

In practice that isn't truly problematic, because checks on XSAVES from common
code are gated on guest CR4.OSXSAVE=1, i.e. implicitly check XSAVE support.  But
the potential danger of sublty divergent behavior between VMX and SVM isn't worth
making AVX vs. XSAVES consistent within VMX, especially since VMX vs. SVM would
still be inconsistent.

> AMD portion of this patch, on the other hand does makes sense, due to a lack
> of a separate XSAVES intercept.

FWIW, AMD also needs precise tracking in order to passthrough XSS for SEV-ES.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 46/49] KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_caps
  2024-07-05  2:34   ` Maxim Levitsky
@ 2024-07-09 19:20     ` Sean Christopherson
  2024-07-24 18:01       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-09 19:20 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > +static __always_inline bool guest_cpuid_has(struct kvm_vcpu *vcpu,
> > +					    unsigned int x86_feature)
> >  {
> >  	const struct cpuid_reg cpuid = x86_feature_cpuid(x86_feature);
> >  	struct kvm_cpuid_entry2 *entry;
> > +	u32 *reg;
> > +
> > +	/*
> > +	 * XSAVES is a special snowflake.  Due to lack of a dedicated intercept
> > +	 * on SVM, KVM must assume that XSAVES (and thus XRSTORS) is usable by
> > +	 * the guest if the host supports XSAVES and *XSAVE* is exposed to the
> > +	 * guest.  Although the guest can read/write XSS via XSAVES/XRSTORS, to
> > +	 * minimize the virtualization hole, KVM rejects attempts to read/write
> > +	 * XSS via RDMSR/WRMSR.  To make that work, KVM needs to check the raw
> > +	 * guest CPUID, not KVM's view of guest capabilities.
> 
> Hi,
> 
> I think that this comment is wrong:
> 
> The guest can't read/write XSS via XSAVES/XRSTORS. It can only use XSAVES/XRSTORS
> to save/restore features that are enabled in XSS, and thus if there are none enabled,
> the XSAVES/XRSTORS acts as more or less XSAVEOPTC/XRSTOR except working only when CPL=0)

Doh, right you are.

> So I don't think that there is a virtualization hole except the fact that VMM can't
> really disable XSAVES if it chooses to.

There is still a hole.  If XSAVES is not supported, KVM runs the guest with the
host XSS.  See the conditional switching in kvm_load_{guest,host}_xsave_state().
Not treating XSAVES as being available to the guest would allow the guest to read
and write host supervisor state.

I'll rewrite the comment to call that.

> Another "half virtualization hole" is that since we have chosen to not
> intercept XSAVES at all, (AMD can't do this at all, and it's slow anyway) we
> instead opted to never support some XSS bits (so far all of them, only
> upcoming CET will add a few supported bits).
> 
> This creates an unexpected situation for the guest - enabled feature (e.g PT)
> but no XSS bit supported to context switch it. x86 arch does allow this
> though.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 33/49] KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID
  2024-07-05  2:04   ` Maxim Levitsky
@ 2024-07-09 19:28     ` Sean Christopherson
  2024-07-24 18:00       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-09 19:28 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> >  4.47 KVM_PPC_GET_PVINFO
> >  -----------------------
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 699ce4261e9c..d1f427284ccc 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -680,8 +680,8 @@ void kvm_set_cpu_caps(void)
> >  		F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
> >  		F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
> >  		F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
> > -		0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
> > -		F(F16C) | F(RDRAND)
> > +		EMUL_F(TSC_DEADLINE_TIMER) | F(AES) | F(XSAVE) |
> > +		0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND)
> >  	);
> >  
> >  	kvm_cpu_cap_init(CPUID_1_EDX,
> 
> Hi,
> 
> I have a mixed feeling about this.
> 
> First of all KVM_GET_SUPPORTED_CPUID documentation explicitly states that it
> returns bits that are supported in *default* configuration TSC_DEADLINE_TIMER
> and arguably X2APIC are only supported after enabling various caps, e.g not
> default configuration.

Another side topic, in the near future, I think we should push to make an in-kernel
local APIC a hard requirement.  AFAIK, userspace local APIC gets no meaningful
test coverage, and IIRC we have known bugs where a userspace APIC doesn't work
as it should, e.g. commit 6550c4df7e50 ("KVM: nVMX: Fix interrupt window request
with "Acknowledge interrupt on exit"").

> However, since X2APIC also in KVM_GET_SUPPORTED_CPUID (also wrongly IMHO),
> for consistency it does make sense to add TSC_DEADLINE_TIMER as well.
> 
> I do think that we need at least to update the documentation of KVM_GET_SUPPORTED_CPUID
> and KVM_GET_EMULATED_CPUID, as I state in a review of a later patch.

+1

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 02/49] KVM: x86: Explicitly do runtime CPUID updates "after" initial setup
  2024-07-05  0:51   ` Maxim Levitsky
@ 2024-07-09 19:46     ` Sean Christopherson
  2024-07-24 17:24       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-09 19:46 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> >  	/*
> >  	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
> >  	 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
> > @@ -440,6 +440,15 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
> >  	 * whether the supplied CPUID data is equal to what's already set.
> >  	 */
> >  	if (kvm_vcpu_has_run(vcpu)) {
> > +		/*
> > +		 * Note, runtime CPUID updates may consume other CPUID-driven
> > +		 * vCPU state, e.g. KVM or Xen CPUID bases.  Updating runtime
> > +		 * state before full CPUID processing is functionally correct
> > +		 * only because any change in CPUID is disallowed, i.e. using
> > +		 * stale data is ok because KVM will reject the change.
> > +		 */
> 
> If I understand correctly the sole reason for the below
> __kvm_update_cpuid_runtime is to ensure that kvm_cpuid_check_equal doesn't
> fail because current cpuid also was post-processed with runtime updates.

Yep.

> Can we have a comment stating this? Or even better how about moving the
> call to __kvm_update_cpuid_runtime into the kvm_cpuid_check_equal,
> to emphasize this?

Ya, I'll do both.

> > +		__kvm_update_cpuid_runtime(vcpu, e2, nent);
> > +
> >  		r = kvm_cpuid_check_equal(vcpu, e2, nent);
> >  		if (r)
> >  			return r;
> 
> 
> 
> Overall I am not 100% sure what is better:
> 
> Before the patch it was roughly like this:
> 
> 1. Post process the user given cpuid with bits of KVM runtime state (like xcr0)
> At that point the vcpu->arch.cpuid_entries is stale but consistent, it is just old CPUID.
> 
> 2. kvm_hv_vcpu_init call (IMHO this call can be moved to kvm_vcpu_after_set_cpuid)
> 
> 3. kvm_check_cpuid on the user provided cpuid
> 
> 4. Update the vcpu->arch.cpuid_entries with new and post processed cpuid
> 
> 5. kvm_get_hypervisor_cpuid - I think this also can be cosmetically moved to kvm_vcpu_after_set_cpuid
> 
> 6. kvm_vcpu_after_set_cpuid itself.
> 
> 
> After this change it works like that:
> 
> 1. kvm_hv_vcpu_init (again this belongs more to kvm_vcpu_after_set_cpuid)
> 2. kvm_check_cpuid on the user cpuid without post processing - in theory this can cause bugs
> 3. Update the vcpu->arch.cpuid_entries with new cpuid but without post-processing
> 4. kvm_get_hypervisor_cpuid
> 5. kvm_update_cpuid_runtime
> 6. The old kvm_vcpu_after_set_cpuid
> 
> I'm honestly not sure what is better but IMHO moving the kvm_hv_vcpu_init and
> kvm_get_hypervisor_cpuid into kvm_vcpu_after_set_cpuid would clean up this
> mess a bit regardless of this patch.

It takes many more patches, but doing the swap() allows for the removal of several
APIs that poke into a "raw" kvm_cpuid_entry2 array, and by the end of the series
(with your above feedback addressed) the code gets to (sans comments):

	swap(vcpu->arch.cpuid_entries, e2);
	swap(vcpu->arch.cpuid_nent, nent);

	memcpy(vcpu_caps, vcpu->arch.cpu_caps, sizeof(vcpu_caps));
	BUILD_BUG_ON(sizeof(vcpu_caps) != sizeof(vcpu->arch.cpu_caps));

	if (kvm_vcpu_has_run(vcpu)) {
		r = kvm_cpuid_check_equal(vcpu, e2, nent);
		if (r)
			goto err;
		goto success;
	}

Those are really just bonuses though, the main goal is to prevent recurrences of
bugs where KVM consumes stale vCPU state[*], which is what prompted this change.

[*] https://lore.kernel.org/all/20240228101837.93642-1-vkuznets@redhat.com

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 03/49] KVM: x86: Account for KVM-reserved CR4 bits when passing through CR4 on VMX
  2024-07-05  0:55   ` Maxim Levitsky
@ 2024-07-09 19:58     ` Sean Christopherson
  2024-07-24 17:28       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-09 19:58 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > Drop x86.c's local pre-computed cr4_reserved bits and instead fold KVM's
> > reserved bits into the guest's reserved bits.  This fixes a bug where VMX's
> > set_cr4_guest_host_mask() fails to account for KVM-reserved bits when
> > deciding which bits can be passed through to the guest.  In most cases,
> > letting the guest directly write reserved CR4 bits is ok, i.e. attempting
> > to set the bit(s) will still #GP, but not if a feature is available in
> > hardware but explicitly disabled by the host, e.g. if FSGSBASE support is
> > disabled via "nofsgsbase".
> > 
> > Note, the extra overhead of computing host reserved bits every time
> > userspace sets guest CPUID is negligible.  The feature bits that are
> > queried are packed nicely into a handful of words, and so checking and
> > setting each reserved bit costs in the neighborhood of ~5 cycles, i.e. the
> > total cost will be in the noise even if the number of checked CR4 bits
> > doubles over the next few years.  In other words, x86 will run out of CR4
> > bits long before the overhead becomes problematic.
> 
> It might be just me, but IMHO this justification is confusing, leading me to
> belive that maybe the code is on the hot-path instead.
> 
> The right justification should be just that this code is in
> kvm_vcpu_after_set_cpuid is usually (*) only called once per vCPU (twice
> after your patch #1)

Ya.  I was trying to capture that even if that weren't true, i.e. even if userspace
was doing something odd, that the extra cost is irrelevant.  I'll expand and reword
the paragraph to make it clear this isn't a hot path for any sane userspace.

> (*) Qemu also calls it, each time vCPU is hotplugged but this doesn't change
> anything performance wise.

...

> > @@ -9831,10 +9826,6 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
> >  	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
> >  		kvm_caps.supported_xss = 0;
> >  
> > -#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
> > -	cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
> > -#undef __kvm_cpu_cap_has
> > -
> >  	if (kvm_caps.has_tsc_control) {
> >  		/*
> >  		 * Make sure the user can only configure tsc_khz values that
> 
> 
> I mostly agree with this patch - caching always carries risks and when it doesn't
> value performance wise, it should always be removed.
> 
> 
> However I don't think that this patch fixes a bug as it claims:
> 
> This is the code prior to this patch:
> 
> kvm_x86_vendor_init ->
> 
> 	r = ops->hardware_setup();
> 		svm_hardware_setup
> 			svm_set_cpu_caps + kvm_set_cpu_caps
> 
> 		-- or --
> 
> 		vmx_hardware_setup ->
> 			vmx_set_cpu_caps + + kvm_set_cpu_caps
> 
> 
> 	# read from 'kvm_cpu_caps'
> 	cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
> 
> 
> AFAIK kvm cpu caps are never touched outside of svm_set_cpu_caps/vmx_hardware_setup
> (they don't depend on some later post-processing, cpuid, etc).
> 
> In fact a good refactoring would to make kvm_cpu_caps const after this point,
> using cast, assert or something like that.
> 
> This leads me to believe that cr4_reserved_bits is computed correctly.

cr4_reserved_bits is computed correctly.  The bug is that cr4_reserved_bits isn't
consulted by set_cr4_guest_host_mask(), which is what I meant by "KVM-reserved
bits" in the changelog.

> I could be wrong, but then IMHO it is a very good idea to provide an explanation
> on how this bug can happen.

The first paragraph of the changelog tries to do that, and I'm struggling to come
up with different wording that makes it more clear what's wrong.  Any ideas/suggestions?

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 48/49] KVM: x86: Add a macro for features that are synthesized into boot_cpu_data
  2024-07-05  2:43   ` Maxim Levitsky
@ 2024-07-09 21:13     ` Sean Christopherson
  2024-07-24 18:04       ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-09 21:13 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > Add yet another CPUID macro, this time for features that the host kernel
> > synthesizes into boot_cpu_data, i.e. that the kernel force sets even in
> > situations where the feature isn't reported by CPUID.  Thanks to the
> > macro shenanigans of kvm_cpu_cap_init(), such features can now be handled
> > in the core CPUID framework, i.e. don't need to be handled out-of-band and
> > thus without as many guardrails.
> > 
> > Adding a dedicated macro also helps document what's going on, e.g. the
> > calls to kvm_cpu_cap_check_and_set() are very confusing unless the reader
> > knows exactly how kvm_cpu_cap_init() generates kvm_cpu_caps (and even
> > then, it's far from obvious).
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---

...

> Now that you added the final F_* macro, let's list all of them:
> 
> #define F(name)							\
> 
> /* Scattered Flag - For features that are scattered by cpufeatures.h. */
> #define SF(name)						\
> 
> /* Features that KVM supports only on 64-bit kernels. */
> #define X86_64_F(name)						\
> 
> /*
>  * Raw Feature - For features that KVM supports based purely on raw host CPUID,
>  * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
>  * Simply force set the feature in KVM's capabilities, raw CPUID support will
>  * be factored in by __kvm_cpu_cap_mask().
>  */
> #define RAW_F(name)						\
> 
> /*
>  * Emulated Feature - For features that KVM emulates in software irrespective
>  * of host CPU/kernel support.
>  */
> #define EMUL_F(name)						\
> 
> /*
>  * Synthesized Feature - For features that are synthesized into boot_cpu_data,
>  * i.e. may not be present in the raw CPUID, but can still be advertised to
>  * userspace.  Primarily used for mitigation related feature flags.
>  */
> #define SYN_F(name)						\
> 
> /*
>  * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
>  * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
>  */
> #define AF(name)								\
> 
> /*
>  * VMM Features - For features that KVM "supports" in some capacity, i.e. that
>  * KVM may query, but that are never advertised to userspace.  E.g. KVM allows
>  * userspace to enumerate MONITOR+MWAIT support to the guest, but the MWAIT
>  * feature flag is never advertised to userspace because MONITOR+MWAIT aren't
>  * virtualized by hardware, can't be faithfully emulated in software (KVM
>  * emulates them as NOPs), and allowing the guest to execute them natively
>  * requires enabling a per-VM capability.
>  */
> #define VMM_F(name)								\
> 
> 
> Honestly, I already somewhat lost in what each of those macros means even
> when reading the comments, which might indicate that a future reader might
> also have a hard time understanding those.
> 
> I now support even more the case of setting each feature bit in a separate
> statement as I explained in an earlier patch.
> 
> What do you think?

I completely agree that there are an absurd number of flavors of features, but
I don't see how using separate statement eliminates any of that complexity.  The
complexity comes from the fact that KVM actually has that many different ways and
combinations for advertising and enumerating CPUID-based features.

Ignoring for the moment that "vmm" and "aliased" could be avoided for any approach,
if we go with statements, we'll still have

  kvm_cpu_cap_init{,passthrough,emulated,synthesized,aliased,vmm,only64}()

or if the flavor is an input/enum,

  enum kvm_cpuid_feature_type {
  	NORMAL,
	PASSTHROUGH,
	EMULATED,
	SYNTHESIZED,
	ALIASED,
	VMM,
	ONLY_64,
  }

I.e. we'll still need the same functionality and comments, it would simply be
dressed up differently.

If the underlying concern is that the macro names are too terse, and/or getting
one feature per line is desirable, then I'm definitely open to exploring alternative
formatting options.  But that's largely orthogonal to using macros instead of
individual function calls.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 11/49] KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation
  2024-05-17 17:38 ` [PATCH v2 11/49] KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after " Sean Christopherson
  2024-07-05  1:17   ` Maxim Levitsky
@ 2024-07-12  7:42   ` Xiaoyao Li
  1 sibling, 0 replies; 185+ messages in thread
From: Xiaoyao Li @ 2024-07-12  7:42 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

On 5/18/2024 1:38 AM, Sean Christopherson wrote:
> Reject KVM_CAP_X86_DISABLE_EXITS if vCPUs have been created, as disabling
> PAUSE/MWAIT/HLT exits after vCPUs have been created is broken and useless,
> e.g. except for PAUSE on SVM, the relevant intercepts aren't updated after
> vCPU creation.  vCPUs may also end up with an inconsistent configuration
> if exits are disabled between creation of multiple vCPUs.
> 
> Cc: Hou Wenlong <houwenlong.hwl@antgroup.com>
> Link: https://lore.kernel.org/all/9227068821b275ac547eb2ede09ec65d2281fe07.1680179693.git.houwenlong.hwl@antgroup.com
> Link: https://lore.kernel.org/all/20230121020738.2973-2-kechenl@nvidia.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   Documentation/virt/kvm/api.rst | 1 +
>   arch/x86/kvm/x86.c             | 6 ++++++
>   2 files changed, 7 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 6ab8b5b7c64e..884846282d06 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7645,6 +7645,7 @@ branch to guests' 0x200 interrupt vector.
>   :Architectures: x86
>   :Parameters: args[0] defines which exits are disabled
>   :Returns: 0 on success, -EINVAL when args[0] contains invalid exits
> +          or if any vCPUs have already been created
>   
>   Valid bits in args[0] are::
>   
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index bb34891d2f0a..4cb0c150a2f8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6568,6 +6568,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   		if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
>   			break;
>   
> +		mutex_lock(&kvm->lock);
> +		if (kvm->created_vcpus)
> +			goto disable_exits_unlock;
> +
>   		if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
>   			kvm->arch.pause_in_guest = true;
>   
> @@ -6589,6 +6593,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   		}
>   
>   		r = 0;
> +disable_exits_unlock:
> +		mutex_unlock(&kvm->lock);
>   		break;
>   	case KVM_CAP_MSR_PLATFORM_INFO:
>   		kvm->arch.guest_can_read_msr_platform_info = cap->args[0];


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed
  2024-05-17 17:38 ` [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed Sean Christopherson
  2024-05-22  5:09   ` Binbin Wu
  2024-07-05  1:17   ` Maxim Levitsky
@ 2024-07-12  7:51   ` Xiaoyao Li
  2024-07-12 13:31     ` Sean Christopherson
  2 siblings, 1 reply; 185+ messages in thread
From: Xiaoyao Li @ 2024-07-12  7:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

On 5/18/2024 1:38 AM, Sean Christopherson wrote:
> Reject KVM_CAP_X86_DISABLE_EXITS if userspace attempts to disable MWAIT or
> HLT exits and KVM previously reported (via KVM_CHECK_EXTENSION) that
> disabling the exit(s) is not allowed.  E.g. because MWAIT isn't supported
> or the CPU doesn't have an aways-running APIC timer, or because KVM is
> configured to mitigate cross-thread vulnerabilities.
> 
> Cc: Kechen Lu <kechenl@nvidia.com>
> Fixes: 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT intercepts")
> Fixes: 6f0f2d5ef895 ("KVM: x86: Mitigate the cross-thread return address predictions bug")
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Just realize the same issue when reading the MWAIT code then find your 
this fix.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   arch/x86/kvm/x86.c | 54 ++++++++++++++++++++++++----------------------
>   1 file changed, 28 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4cb0c150a2f8..c729227c6501 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4590,6 +4590,20 @@ static inline bool kvm_can_mwait_in_guest(void)
>   		boot_cpu_has(X86_FEATURE_ARAT);
>   }
>   
> +static u64 kvm_get_allowed_disable_exits(void)
> +{
> +	u64 r = KVM_X86_DISABLE_EXITS_PAUSE;
> +
> +	if (!mitigate_smt_rsb) {
> +		r |= KVM_X86_DISABLE_EXITS_HLT |
> +			KVM_X86_DISABLE_EXITS_CSTATE;
> +
> +		if (kvm_can_mwait_in_guest())
> +			r |= KVM_X86_DISABLE_EXITS_MWAIT;
> +	}
> +	return r;
> +}
> +
>   #ifdef CONFIG_KVM_HYPERV
>   static int kvm_ioctl_get_supported_hv_cpuid(struct kvm_vcpu *vcpu,
>   					    struct kvm_cpuid2 __user *cpuid_arg)
> @@ -4726,15 +4740,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   		r = KVM_CLOCK_VALID_FLAGS;
>   		break;
>   	case KVM_CAP_X86_DISABLE_EXITS:
> -		r = KVM_X86_DISABLE_EXITS_PAUSE;
> -
> -		if (!mitigate_smt_rsb) {
> -			r |= KVM_X86_DISABLE_EXITS_HLT |
> -			     KVM_X86_DISABLE_EXITS_CSTATE;
> -
> -			if (kvm_can_mwait_in_guest())
> -				r |= KVM_X86_DISABLE_EXITS_MWAIT;
> -		}
> +		r |= kvm_get_allowed_disable_exits();
>   		break;
>   	case KVM_CAP_X86_SMM:
>   		if (!IS_ENABLED(CONFIG_KVM_SMM))
> @@ -6565,33 +6571,29 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   		break;
>   	case KVM_CAP_X86_DISABLE_EXITS:
>   		r = -EINVAL;
> -		if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
> +		if (cap->args[0] & ~kvm_get_allowed_disable_exits())

sigh.

KVM_X86_DISABLE_VALID_EXITS has no user now. But we cannot remove it 
since it's in uapi header, right?

>   			break;
>   
>   		mutex_lock(&kvm->lock);
>   		if (kvm->created_vcpus)
>   			goto disable_exits_unlock;
>   
> -		if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
> -			kvm->arch.pause_in_guest = true;
> -
>   #define SMT_RSB_MSG "This processor is affected by the Cross-Thread Return Predictions vulnerability. " \
>   		    "KVM_CAP_X86_DISABLE_EXITS should only be used with SMT disabled or trusted guests."
>   
> -		if (!mitigate_smt_rsb) {
> -			if (boot_cpu_has_bug(X86_BUG_SMT_RSB) && cpu_smt_possible() &&
> -			    (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE))
> -				pr_warn_once(SMT_RSB_MSG);
> -
> -			if ((cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT) &&
> -			    kvm_can_mwait_in_guest())
> -				kvm->arch.mwait_in_guest = true;
> -			if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
> -				kvm->arch.hlt_in_guest = true;
> -			if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
> -				kvm->arch.cstate_in_guest = true;
> -		}
> +		if (!mitigate_smt_rsb && boot_cpu_has_bug(X86_BUG_SMT_RSB) &&
> +		    cpu_smt_possible() &&
> +		    (cap->args[0] & ~KVM_X86_DISABLE_EXITS_PAUSE))
> +			pr_warn_once(SMT_RSB_MSG);
>   
> +		if (cap->args[0] & KVM_X86_DISABLE_EXITS_PAUSE)
> +			kvm->arch.pause_in_guest = true;
> +		if (cap->args[0] & KVM_X86_DISABLE_EXITS_MWAIT)
> +			kvm->arch.mwait_in_guest = true;
> +		if (cap->args[0] & KVM_X86_DISABLE_EXITS_HLT)
> +			kvm->arch.hlt_in_guest = true;
> +		if (cap->args[0] & KVM_X86_DISABLE_EXITS_CSTATE)
> +			kvm->arch.cstate_in_guest = true;
>   		r = 0;
>   disable_exits_unlock:
>   		mutex_unlock(&kvm->lock);


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed
  2024-07-12  7:51   ` Xiaoyao Li
@ 2024-07-12 13:31     ` Sean Christopherson
  0 siblings, 0 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-07-12 13:31 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Maxim Levitsky, Binbin Wu, Yang Weijiang,
	Robert Hoo

On Fri, Jul 12, 2024, Xiaoyao Li wrote:
> On 5/18/2024 1:38 AM, Sean Christopherson wrote:
> > @@ -6565,33 +6571,29 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >   		break;
> >   	case KVM_CAP_X86_DISABLE_EXITS:
> >   		r = -EINVAL;
> > -		if (cap->args[0] & ~KVM_X86_DISABLE_VALID_EXITS)
> > +		if (cap->args[0] & ~kvm_get_allowed_disable_exits())
> 
> sigh.
> 
> KVM_X86_DISABLE_VALID_EXITS has no user now. But we cannot remove it since
> it's in uapi header, right?

We can, actually.  Forcing userspace to make changes when userspace updates their
copy of the headers is ok (building directly against kernel headers is discouraged).

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 21/49] KVM: x86: Add a macro to init CPUID features that are 64-bit only
  2024-05-17 17:38 ` [PATCH v2 21/49] KVM: x86: Add a macro to init CPUID features that are 64-bit only Sean Christopherson
  2024-07-05  1:24   ` Maxim Levitsky
@ 2024-07-17 13:31   ` Xiaoyao Li
  1 sibling, 0 replies; 185+ messages in thread
From: Xiaoyao Li @ 2024-07-17 13:31 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov
  Cc: kvm, linux-kernel, Hou Wenlong, Kechen Lu, Oliver Upton,
	Maxim Levitsky, Binbin Wu, Yang Weijiang, Robert Hoo

On 5/18/2024 1:38 AM, Sean Christopherson wrote:
> Add a macro to mask-in feature flags that are supported only on 64-bit
> kernels/KVM.  In addition to reducing overall #ifdeffery, using a macro
> will allow hardening the kvm_cpu_cap initialization sequences to assert
> that the features being advertised are indeed included in the word being
> initialized.  And arguably using *F() macros through is more readable.
> 
> No functional change intended.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Very nice patch!

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> ---
>   arch/x86/kvm/cpuid.c | 22 ++++++++++------------
>   1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 5a4d6138c4f1..5e3b97d06374 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -70,6 +70,12 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
>   	(boot_cpu_has(X86_FEATURE_##name) ? F(name) : 0);	\
>   })
>   
> +/* Features that KVM supports only on 64-bit kernels. */
> +#define X86_64_F(name)						\
> +({								\
> +	(IS_ENABLED(CONFIG_X86_64) ? F(name) : 0);		\
> +})
> +
>   /*
>    * Raw Feature - For features that KVM supports based purely on raw host CPUID,
>    * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
> @@ -639,15 +645,6 @@ static __always_inline void kvm_cpu_cap_init(enum cpuid_leafs leaf, u32 mask)
>   
>   void kvm_set_cpu_caps(void)
>   {
> -#ifdef CONFIG_X86_64
> -	unsigned int f_gbpages = F(GBPAGES);
> -	unsigned int f_lm = F(LM);
> -	unsigned int f_xfd = F(XFD);
> -#else
> -	unsigned int f_gbpages = 0;
> -	unsigned int f_lm = 0;
> -	unsigned int f_xfd = 0;
> -#endif
>   	memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));
>   
>   	BUILD_BUG_ON(sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)) >
> @@ -744,7 +741,8 @@ void kvm_set_cpu_caps(void)
>   	);
>   
>   	kvm_cpu_cap_init(CPUID_D_1_EAX,
> -		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) | f_xfd
> +		F(XSAVEOPT) | F(XSAVEC) | F(XGETBV1) | F(XSAVES) |
> +		X86_64_F(XFD)
>   	);
>   
>   	kvm_cpu_cap_init_kvm_defined(CPUID_12_EAX,
> @@ -766,8 +764,8 @@ void kvm_set_cpu_caps(void)
>   		F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
>   		F(PAT) | F(PSE36) | 0 /* Reserved */ |
>   		F(NX) | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
> -		F(FXSR) | F(FXSR_OPT) | f_gbpages | F(RDTSCP) |
> -		0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW)
> +		F(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
> +		0 /* Reserved */ | X86_64_F(LM) | F(3DNOWEXT) | F(3DNOW)
>   	);
>   
>   	if (!tdp_enabled && IS_ENABLED(CONFIG_X86_64))


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 23/49] KVM: x86: Handle kernel- and KVM-defined CPUID words in a single helper
  2024-07-08 21:18     ` Sean Christopherson
@ 2024-07-17 14:00       ` Xiaoyao Li
  2024-07-24 17:51       ` Maxim Levitsky
  1 sibling, 0 replies; 185+ messages in thread
From: Xiaoyao Li @ 2024-07-17 14:00 UTC (permalink / raw)
  To: Sean Christopherson, Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On 7/9/2024 5:18 AM, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
>> On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
>>> Merge kvm_cpu_cap_init() and kvm_cpu_cap_init_kvm_defined() into a single
>>> helper.  The only advantage of separating the two was to make it somewhat
>>> obvious that KVM directly initializes the KVM-defined words, whereas using
>>> a common helper will allow for hardening both kernel- and KVM-defined
>>> CPUID words without needing copy+paste.
>>>
>>> No functional change intended.
>>>
>>> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

>>> ---
>>>   arch/x86/kvm/cpuid.c | 44 +++++++++++++++-----------------------------
>>>   1 file changed, 15 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>>> index f2bd2f5c4ea3..8efffd48cdf1 100644
>>> --- a/arch/x86/kvm/cpuid.c
>>> +++ b/arch/x86/kvm/cpuid.c
>>> @@ -622,37 +622,23 @@ static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid)
>>>   	return *__cpuid_entry_get_reg(&entry, cpuid.reg);
>>>   }
>>>   
>>> -/* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */
>>> -static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
>>> +static __always_inline void kvm_cpu_cap_init(u32 leaf, u32 mask)
>>>   {
>>>   	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
>>>   
>>> -	reverse_cpuid_check(leaf);
>>> +	/*
>>> +	 * For kernel-defined leafs, mask the boot CPU's pre-populated value.
>>> +	 * For KVM-defined leafs, explicitly set the leaf, as KVM is the one
>>> +	 * and only authority.
>>> +	 */
>>> +	if (leaf < NCAPINTS)
>>> +		kvm_cpu_caps[leaf] &= mask;
>>> +	else
>>> +		kvm_cpu_caps[leaf] = mask;
>>
>> Hi,
>>
>> I have an idea, how about we just initialize the kvm only leafs to 0xFFFFFFFF
>> and then treat them exactly in the same way as kernel regular leafs?
>>
>> Then the user won't have to figure out (assuming that the user doesn't read
>> the comment, who does?) why we use mask as init value.
>>
>> But if you prefer to leave it this way, I won't object either.
> 
> Huh, hadn't thought of that.  It's a small code change, but I'm leaning towards
> keeping the current code as we'd still need a comment to explain why KVM sets
> all bits by default.  

> And in the unlikely case that we royally screw up and fail
> to call kvm_cpu_cap_init() on a word, starting with 0xff would result in all
> features in the uninitialized word being treated as supported.

+1

> For posterity...
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 18ded0e682f2..6fcfb0fa4bd6 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -762,11 +762,7 @@ do {                                                                       \
>          u32 kvm_cpu_cap_emulated = 0;                                   \
>          u32 kvm_cpu_cap_synthesized = 0;                                \
>                                                                          \
> -       if (leaf < NCAPINTS)                                            \
> -               kvm_cpu_caps[leaf] &= (mask);                           \
> -       else                                                            \
> -               kvm_cpu_caps[leaf] = (mask);                            \
> -                                                                       \
> +       kvm_cpu_caps[leaf] &= (mask);                                   \
>          kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |                   \
>                                 kvm_cpu_cap_synthesized);                \
>          kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;                     \
> @@ -780,7 +776,7 @@ do {                                                                        \
>   
>   void kvm_set_cpu_caps(void)
>   {
> -       memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));
> +       memset(kvm_cpu_caps, 0xff, sizeof(kvm_cpu_caps));
>   
>          BUILD_BUG_ON(sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)) >
>                       sizeof(boot_cpu_data.x86_capability));
> 


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 01/49] KVM: x86: Do all post-set CPUID processing during vCPU creation
  2024-07-08 18:46     ` Sean Christopherson
@ 2024-07-24 17:24       ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 17:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, 2024-07-08 at 11:46 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> > > index 23dbb9eb277c..0a8b561b5434 100644
> > > --- a/arch/x86/kvm/cpuid.h
> > > +++ b/arch/x86/kvm/cpuid.h
> > > @@ -11,6 +11,7 @@
> > >  extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
> > >  void kvm_set_cpu_caps(void);
> > >  
> > > +void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu);
> > >  void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
> > >  void kvm_update_pv_runtime(struct kvm_vcpu *vcpu);
> > >  struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index d750546ec934..7adcf56bd45d 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -12234,6 +12234,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> > >  	kvm_xen_init_vcpu(vcpu);
> > >  	kvm_vcpu_mtrr_init(vcpu);
> > >  	vcpu_load(vcpu);
> > > +	kvm_vcpu_after_set_cpuid(vcpu);
> > 
> > This makes me a bit nervous. At this point the vcpu->arch.cpuid_entries is
> > NULL, but so is vcpu->arch.cpuid_nent so it sort of works but is one mistake
> > away from crash.
> > 
> > Maybe we should add some protection to this, e.g empty zero cpuid or
> > something like that.
> 
> Hmm, a crash is actually a good thing.  In the post-KVM_SET_CPUID2 case, if KVM
> accessed vcpu->arch.cpuid_entries without properly consulting cpuid_nent, the
> resulting failure would be a out-of-bounds read.  Similarly, a zeroed CPUID array
> would effectiely mask any bugs.
> 
> Given that KVM heavily relies on "vcpu" to be zero-allocated, and that changing
> cpuid_nent during kvm_arch_vcpu_create() would be an extremely egregious bug,
> a crash due to a NULL-pointer dereference should never escape developer testing,
> let alone full release testing.
> 
> KVM does the "empty" array thing for IRQ routing (though in that case the array
> and the nr_entries are in a single struct), and IMO it's been a huge net negative
> because it's led to increased complexity just so that arch code can omit a NULL
> check.
> 

Makes sense, let it be.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 02/49] KVM: x86: Explicitly do runtime CPUID updates "after" initial setup
  2024-07-09 19:46     ` Sean Christopherson
@ 2024-07-24 17:24       ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 17:24 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Tue, 2024-07-09 at 12:46 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > >  	/*
> > >  	 * KVM does not correctly handle changing guest CPUID after KVM_RUN, as
> > >  	 * MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
> > > @@ -440,6 +440,15 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
> > >  	 * whether the supplied CPUID data is equal to what's already set.
> > >  	 */
> > >  	if (kvm_vcpu_has_run(vcpu)) {
> > > +		/*
> > > +		 * Note, runtime CPUID updates may consume other CPUID-driven
> > > +		 * vCPU state, e.g. KVM or Xen CPUID bases.  Updating runtime
> > > +		 * state before full CPUID processing is functionally correct
> > > +		 * only because any change in CPUID is disallowed, i.e. using
> > > +		 * stale data is ok because KVM will reject the change.
> > > +		 */
> > 
> > If I understand correctly the sole reason for the below
> > __kvm_update_cpuid_runtime is to ensure that kvm_cpuid_check_equal doesn't
> > fail because current cpuid also was post-processed with runtime updates.
> 
> Yep.
> 
> > Can we have a comment stating this? Or even better how about moving the
> > call to __kvm_update_cpuid_runtime into the kvm_cpuid_check_equal,
> > to emphasize this?
> 
> Ya, I'll do both.
> 
> > > +		__kvm_update_cpuid_runtime(vcpu, e2, nent);
> > > +
> > >  		r = kvm_cpuid_check_equal(vcpu, e2, nent);
> > >  		if (r)
> > >  			return r;
> > 
> > 
> > Overall I am not 100% sure what is better:
> > 
> > Before the patch it was roughly like this:
> > 
> > 1. Post process the user given cpuid with bits of KVM runtime state (like xcr0)
> > At that point the vcpu->arch.cpuid_entries is stale but consistent, it is just old CPUID.
> > 
> > 2. kvm_hv_vcpu_init call (IMHO this call can be moved to kvm_vcpu_after_set_cpuid)
> > 
> > 3. kvm_check_cpuid on the user provided cpuid
> > 
> > 4. Update the vcpu->arch.cpuid_entries with new and post processed cpuid
> > 
> > 5. kvm_get_hypervisor_cpuid - I think this also can be cosmetically moved to kvm_vcpu_after_set_cpuid
> > 
> > 6. kvm_vcpu_after_set_cpuid itself.
> > 
> > 
> > After this change it works like that:
> > 
> > 1. kvm_hv_vcpu_init (again this belongs more to kvm_vcpu_after_set_cpuid)
> > 2. kvm_check_cpuid on the user cpuid without post processing - in theory this can cause bugs
> > 3. Update the vcpu->arch.cpuid_entries with new cpuid but without post-processing
> > 4. kvm_get_hypervisor_cpuid
> > 5. kvm_update_cpuid_runtime
> > 6. The old kvm_vcpu_after_set_cpuid
> > 
> > I'm honestly not sure what is better but IMHO moving the kvm_hv_vcpu_init and
> > kvm_get_hypervisor_cpuid into kvm_vcpu_after_set_cpuid would clean up this
> > mess a bit regardless of this patch.
> 
> It takes many more patches, but doing the swap() allows for the removal of several
> APIs that poke into a "raw" kvm_cpuid_entry2 array, and by the end of the series
> (with your above feedback addressed) the code gets to (sans comments):
> 
> 	swap(vcpu->arch.cpuid_entries, e2);
> 	swap(vcpu->arch.cpuid_nent, nent);
> 
> 	memcpy(vcpu_caps, vcpu->arch.cpu_caps, sizeof(vcpu_caps));
> 	BUILD_BUG_ON(sizeof(vcpu_caps) != sizeof(vcpu->arch.cpu_caps));
> 
> 	if (kvm_vcpu_has_run(vcpu)) {
> 		r = kvm_cpuid_check_equal(vcpu, e2, nent);
> 		if (r)
> 			goto err;
> 		goto success;
> 	}
> 
> Those are really just bonuses though, the main goal is to prevent recurrences of
> bugs where KVM consumes stale vCPU state[*], which is what prompted this change.
> 
> [*] https://lore.kernel.org/all/20240228101837.93642-1-vkuznets@redhat.com
> 

All makes sense, thanks!
Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 03/49] KVM: x86: Account for KVM-reserved CR4 bits when passing through CR4 on VMX
  2024-07-09 19:58     ` Sean Christopherson
@ 2024-07-24 17:28       ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 17:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Tue, 2024-07-09 at 12:58 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > > Drop x86.c's local pre-computed cr4_reserved bits and instead fold KVM's
> > > reserved bits into the guest's reserved bits.  This fixes a bug where VMX's
> > > set_cr4_guest_host_mask() fails to account for KVM-reserved bits when
> > > deciding which bits can be passed through to the guest.  In most cases,
> > > letting the guest directly write reserved CR4 bits is ok, i.e. attempting
> > > to set the bit(s) will still #GP, but not if a feature is available in
> > > hardware but explicitly disabled by the host, e.g. if FSGSBASE support is
> > > disabled via "nofsgsbase".
> > > 
> > > Note, the extra overhead of computing host reserved bits every time
> > > userspace sets guest CPUID is negligible.  The feature bits that are
> > > queried are packed nicely into a handful of words, and so checking and
> > > setting each reserved bit costs in the neighborhood of ~5 cycles, i.e. the
> > > total cost will be in the noise even if the number of checked CR4 bits
> > > doubles over the next few years.  In other words, x86 will run out of CR4
> > > bits long before the overhead becomes problematic.
> > 
> > It might be just me, but IMHO this justification is confusing, leading me to
> > belive that maybe the code is on the hot-path instead.
> > 
> > The right justification should be just that this code is in
> > kvm_vcpu_after_set_cpuid is usually (*) only called once per vCPU (twice
> > after your patch #1)
> 
> Ya.  I was trying to capture that even if that weren't true, i.e. even if userspace
> was doing something odd, that the extra cost is irrelevant.  I'll expand and reword
> the paragraph to make it clear this isn't a hot path for any sane userspace.
Thank you!

> 
> > (*) Qemu also calls it, each time vCPU is hotplugged but this doesn't change
> > anything performance wise.
> 
> ...
> 
> > > @@ -9831,10 +9826,6 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
> > >  	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
> > >  		kvm_caps.supported_xss = 0;
> > >  
> > > -#define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
> > > -	cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
> > > -#undef __kvm_cpu_cap_has
> > > -
> > >  	if (kvm_caps.has_tsc_control) {
> > >  		/*
> > >  		 * Make sure the user can only configure tsc_khz values that
> > 
> > I mostly agree with this patch - caching always carries risks and when it doesn't
> > value performance wise, it should always be removed.
> > 
> > 
> > However I don't think that this patch fixes a bug as it claims:
> > 
> > This is the code prior to this patch:
> > 
> > kvm_x86_vendor_init ->
> > 
> > 	r = ops->hardware_setup();
> > 		svm_hardware_setup
> > 			svm_set_cpu_caps + kvm_set_cpu_caps
> > 
> > 		-- or --
> > 
> > 		vmx_hardware_setup ->
> > 			vmx_set_cpu_caps + + kvm_set_cpu_caps
> > 
> > 
> > 	# read from 'kvm_cpu_caps'
> > 	cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
> > 
> > 
> > AFAIK kvm cpu caps are never touched outside of svm_set_cpu_caps/vmx_hardware_setup
> > (they don't depend on some later post-processing, cpuid, etc).
> > 
> > In fact a good refactoring would to make kvm_cpu_caps const after this point,
> > using cast, assert or something like that.
> > 
> > This leads me to believe that cr4_reserved_bits is computed correctly.
> 
> cr4_reserved_bits is computed correctly.  The bug is that cr4_reserved_bits isn't
> consulted by set_cr4_guest_host_mask(), which is what I meant by "KVM-reserved
> bits" in the changelog.

Ah, I see it now.

I also see that set_cr4_guest_host_mask, limits the guest owned bits to a small
whitelist, and none of these bits looks scary, so it all make sense that this
is mostly a theoretical bug, but for sure worth fixing.


> 
> > I could be wrong, but then IMHO it is a very good idea to provide an explanation
> > on how this bug can happen.
> 
> The first paragraph of the changelog tries to do that, and I'm struggling to come
> up with different wording that makes it more clear what's wrong.  Any ideas/suggestions?
> 

I also re-read it, and now it all makes sense. I guess I just somehow got fixed on
thinking that cr4_reserved_bits was not computed incorrectly rather than just
not used.
The comment indeed now makes sense to me, so let it be as it is.


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 05/49] KVM: selftests: Assert that the @cpuid passed to get_cpuid_entry() is non-NULL
  2024-07-08 19:33     ` Sean Christopherson
@ 2024-07-24 17:28       ` Maxim Levitsky
  2024-11-21 18:57         ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 17:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, 2024-07-08 at 19:33 +0000, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > > Add a sanity check in get_cpuid_entry() to provide a friendlier error than
> > > a segfault when a test developer tries to use a vCPU CPUID helper on a
> > > barebones vCPU.
> > > 
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  tools/testing/selftests/kvm/lib/x86_64/processor.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > > index c664e446136b..f0f3434d767e 100644
> > > --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > > +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > > @@ -1141,6 +1141,8 @@ const struct kvm_cpuid_entry2 *get_cpuid_entry(const struct kvm_cpuid2 *cpuid,
> > >  {
> > >  	int i;
> > >  
> > > +	TEST_ASSERT(cpuid, "Must do vcpu_init_cpuid() first (or equivalent)");
> > > +
> > >  	for (i = 0; i < cpuid->nent; i++) {
> > >  		if (cpuid->entries[i].function == function &&
> > >  		    cpuid->entries[i].index == index)
> > 
> > Hi,
> > 
> > Maybe it is better to do this assert in __vcpu_get_cpuid_entry() because the
> > assert might confuse the reader, since it just tests for NULL but when it
> > fails, it complains that you need to call some function.
> 
> IIRC, I originally added the assert in __vcpu_get_cpuid_entry(), but I didn't
> like leaving get_cpuid_entry() unprotected.  What if I add an assert in both?
> E.g. have __vcpu_get_cpuid_entry() assert with the (hopefully) hepful message,
> and have get_cpuid_entry() do a simple TEST_ASSERT_NE()?
> 

This looks like a great idea.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 10/49] KVM: x86: Drop now-redundant MAXPHYADDR and GPA rsvd bits from vCPU creation
  2024-07-08 19:53     ` Sean Christopherson
@ 2024-07-24 17:30       ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 17:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, 2024-07-08 at 12:53 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > > Drop the manual initialization of maxphyaddr and reserved_gpa_bits during
> > > vCPU creation now that kvm_arch_vcpu_create() unconditionally invokes
> > > kvm_vcpu_after_set_cpuid(), which handles all such CPUID caching.
> > > 
> > > None of the helpers between the existing code in kvm_arch_vcpu_create()
> > > and the call to kvm_vcpu_after_set_cpuid() consume maxphyaddr or
> > > reserved_gpa_bits (though auditing vmx_vcpu_create() and svm_vcpu_create()
> > > isn't exactly easy).  And even if that weren't the case, KVM _must_
> > > refresh any affected state during kvm_vcpu_after_set_cpuid(), e.g. to
> > > correctly handle KVM_SET_CPUID2.  In other words, this can't introduce a
> > > new bug, only expose an existing bug (of which there don't appear to be
> > > any).
> > 
> > IMHO the change is not as bulletproof as claimed:
> > 
> > If some code does access the uninitialized state (e.g vcpu->arch.maxphyaddr
> > which will be zero, I assume), in between these calls, then even though later
> > the correct CPUID will be set and should override the incorrect state set
> > earlier, the problem *is* that the mentioned code will have to deal with non
> > architecturally possible value (e.g maxphyaddr == 0) which might cause a bug
> > in it.
> > 
> > Of course such code currently doesn't exist, so it works but it can fail in
> > the future.
> 
> Similar to not consuming a null cpuid_entries, any such future bug should never
> escape developer testing since this is a very fixed sequence.  And practically
> speaking, completely closing these holes isn't feasible because it's impossible
> to initialize everything simultaneously, i.e. some amount of code will always
> need to execute with zero-initialized vCPU state.
> 
> > How about we move the call to kvm_vcpu_after_set_cpuid upward?
> 
> A drop-in replacement was my preference too, but it doesn't work.  :-/
> kvm_vcpu_after_set_cpuid() needs to be called after vcpu_load(), e.g. VMX's
> hook will do VMWRITE.
> 

Let it be then, but let's at least drop the part of the commit message after
'And even if that weren't the case', just not to confuse future reader,
because as I explained, this is not 100% bulletproof.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 11/49] KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation
  2024-07-08 19:43     ` Sean Christopherson
@ 2024-07-24 17:31       ` Maxim Levitsky
  2024-07-25 18:07         ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 17:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, 2024-07-08 at 19:43 +0000, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > > Reject KVM_CAP_X86_DISABLE_EXITS if vCPUs have been created, as disabling
> > > PAUSE/MWAIT/HLT exits after vCPUs have been created is broken and useless,
> > > e.g. except for PAUSE on SVM, the relevant intercepts aren't updated after
> > > vCPU creation.  vCPUs may also end up with an inconsistent configuration
> > > if exits are disabled between creation of multiple vCPUs.
> > 
> > Hi,
> > 
> > I am not sure that PAUSE intercepts are updated either, I wasn't able to find a code
> > that does this.
> > 
> > I agree with this change, but note that there was some talk on the mailing
> > list to allow to selectively disable VM exits (e.g PAUSE, MWAIT, ...) only on
> > some vCPUs, based on the claim that some vCPUs might run RT tasks, while some
> > might be housekeeping.  I haven't followed those discussions closely.
> 
> This change is actually pulled from that series[*].  IIRC, v1 of that series
> didn't close the VM-scoped hole, and the overall code was much more complex as
> a result.
> 
> [*] https://lore.kernel.org/all/20230121020738.2973-2-kechenl@nvidia.com
> 

Hi,
Thanks for the pointer, I searched for this patch series in many places but I couldn't find it.
Any idea what happened with this patch series btw?

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 19/49] KVM: x86: Add a macro to init CPUID features that ignore host kernel support
  2024-07-08 20:53     ` Sean Christopherson
@ 2024-07-24 17:39       ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 17:39 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, 2024-07-08 at 13:53 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > > Add a macro for use in kvm_set_cpu_caps() to automagically initialize
> > > features that KVM wants to support based solely on the CPU's capabilities,
> > > e.g. KVM advertises LA57 support if it's available in hardware, even if
> > > the host kernel isn't utilizing 57-bit virtual addresses.
> > > 
> > > Take advantage of the fact that kvm_cpu_cap_mask() adjusts kvm_cpu_caps
> > > based on raw CPUID, i.e. will clear features bits that aren't supported in
> > > hardware, and simply force-set the capability before applying the mask.
> > > 
> > > Abusing kvm_cpu_cap_set() is a borderline evil shenanigan, but doing so
> > > avoid extra CPUID lookups, and a future commit will harden the entire
> > > family of *F() macros to assert (at compile time) that every feature being
> > > allowed is part of the capability word being processed, i.e. using a macro
> > > will bring more advantages in the future.
> > 
> > Could you explain what do you mean by "extra CPUID lookups"?
> 
> cpuid_ecx(7) incurs a CPUID to read the raw info, on top of the CPUID that is
> executed by kvm_cpu_cap_init() (kvm_cpu_cap_mask() as of this patch).  Obviously
> not a big deal, but it's an extra VM-Exit when running as a VM.
> 
> > > +/*
> > > + * Raw Feature - For features that KVM supports based purely on raw host CPUID,
> > > + * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
> > > + * Simply force set the feature in KVM's capabilities, raw CPUID support will
> > > + * be factored in by kvm_cpu_cap_mask().
> > > + */
> > > +#define RAW_F(name)						\
> > > +({								\
> > > +	kvm_cpu_cap_set(X86_FEATURE_##name);			\
> > > +	F(name);						\
> > > +})
> > > +
> > >  /*
> > >   * Magic value used by KVM when querying userspace-provided CPUID entries and
> > >   * doesn't care about the CPIUD index because the index of the function in
> > > @@ -682,15 +694,12 @@ void kvm_set_cpu_caps(void)
> > >  		F(AVX512VL));
> > >  
> > >  	kvm_cpu_cap_mask(CPUID_7_ECX,
> > > -		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
> > > +		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
> > >  		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
> > >  		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> > >  		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
> > >  		F(SGX_LC) | F(BUS_LOCK_DETECT)
> > >  	);
> > > -	/* Set LA57 based on hardware capability. */
> > > -	if (cpuid_ecx(7) & F(LA57))
> > > -		kvm_cpu_cap_set(X86_FEATURE_LA57);
> > >  
> > >  	/*
> > >  	 * PKU not yet implemented for shadow paging and requires OSPKE
> > 
> > Putting a function call into a macro which evaluates into a bitmask is
> > somewhat misleading, but let it be...
> > 
> > IMHO in long term, it might be better to rip the whole huge 'or'ed mess, and replace
> > it with a list of statements, along with comments for all unusual cases.
> 
> As in something like this?
> 
> 	kvm_cpu_cap_init(AVX512VBMI);
> 	kvm_cpu_cap_init_raw(LA57);
> 	kvm_cpu_cap_init(PKU);
> 	...
> 	kvm_cpu_cap_init(BUS_LOCK_DETECT);
> 
> 	kvm_cpu_cap_init_aliased(CPUID_8000_0001_EDX, FPU);
> 
> 	...
> 
> 	kvm_cpu_cap_init_scattered(CPUID_12_EAX, SGX1);
> 	kvm_cpu_cap_init_scattered(CPUID_12_EAX, SGX2);
> 	kvm_cpu_cap_init_scattered(CPUID_12_EAX, SGX_EDECCSSA);
> 
> The tricky parts are incorporating raw CPUID into the masking and handling features
> that KVM _doesn't_ support.  For raw CPUID, we could simply do CPUID every time, or
> pre-fill an array to avoid hundreds of CPUIDs that are largely redudant.

In terms of performance, again this code is run once per kvm module load, so even
if it does something truly gross performance wise, it's not a problem, even if run
nested.

> 
> But I don't see a way to mask off unsupported features without losing the
> compile-time protections that the current code provides.  And even if we took a
> big hammer approach, e.g. finalized masking for all words at the very end, we'd
> still need to carry state across each statement, i.e. we'd still need the bitwise-OR
> and mask  behavior, it would just be buried in helpers/macros.

Can you elaborate on this?

For example let's say this:


	kvm_cpu_cap_init(CPUID_7_0_EBX,
		F(FSGSBASE) | EMUL_F(TSC_ADJUST) | F(SGX) | F(BMI1) | F(HLE) |
		F(AVX2) | F(FDP_EXCPTN_ONLY) | F(SMEP) | F(BMI2) | F(ERMS) |
		F(INVPCID) | F(RTM) | F(ZERO_FCS_FDS) | 0 /*MPX*/ |
		F(AVX512F) | F(AVX512DQ) | F(RDSEED) | F(ADX) | F(SMAP) |
		F(AVX512IFMA) | F(CLFLUSHOPT) | F(CLWB) | 0 /*INTEL_PT*/ |
		F(AVX512PF) | F(AVX512ER) | F(AVX512CD) | F(SHA_NI) |
		F(AVX512BW) | F(AVX512VL));



This will be replaced with:

kvm_cpu_cap_clear_all(CPUID_7_0_EBX);

kvm_cpu_cap_init(FSGSBASE);
kvm_cpu_cap_init(TSC_ADJUST, CAP_EMULATED);
..
kvm_cpu_cap_init(AVX512VL);

Then each 'kvm_cpu_cap_init' will opt-in to set a bit if supported in host cpuid, or
always opt-in for emulated features, etc....

Host CPUID can indeed be cached, if extra host cpuid queries cause too slow (e.g 1 second) delay
when nested.



> 
> I suspect the generated code will be larger, but I doubt that will actually be
> problematic.  

Yes, 100% agree.


> The written code will also be more verbose (roughly 4x since we
> tend to squeeze 4 features per line)

It will be about as long as the list of macros in the cpufeatures.h, where
all features are nicely ordered by cpuid leaves.

In this case I consider verbose long code to be an improvement.

IMHO the OR'ed mask of macros is just too terse, hard to parse.
It was borderline OK, before this patch series because it only
contained features, but now it also contains various modifiers,
IMHO it's just hard to notice that EMUL_F at that corner...


> , and it will be harder to ensure initialization
> of features in a given word are all co-located.

Actually co-location won't be needed.

We can first copy the caps from boot_cpu_data,
then zero all the leaves that we initialize ourselves.

After that we can initialize opt-in features in any order - it will still be sorted
by CPUID leaves but even if the order is broken (e.g due to cherry-pick or something),
it won't cause any issues.


> 
> I definitely don't hate the idea, but I don't think it will be a clear "win" either.
> Unless someone feels strongly about pursuing this approach, I'll add to the "things
> to explore later" list.
> 

Please do consider this, I am almost sure that whoever will need to read this code later (could be you...),
will thank you.


Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 19/49] KVM: x86: Add a macro to init CPUID features that ignore host kernel support
  2024-07-08 22:36     ` Sean Christopherson
@ 2024-07-24 17:40       ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 17:40 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, 2024-07-08 at 15:36 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > > +/*
> > > + * Raw Feature - For features that KVM supports based purely on raw host CPUID,
> > > + * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
> > > + * Simply force set the feature in KVM's capabilities, raw CPUID support will
> > > + * be factored in by kvm_cpu_cap_mask().
> > > + */
> > > +#define RAW_F(name)						\
> > > +({								\
> > > +	kvm_cpu_cap_set(X86_FEATURE_##name);			\
> > > +	F(name);						\
> > > +})
> > > +
> > >  /*
> > >   * Magic value used by KVM when querying userspace-provided CPUID entries and
> > >   * doesn't care about the CPIUD index because the index of the function in
> > > @@ -682,15 +694,12 @@ void kvm_set_cpu_caps(void)
> > >  		F(AVX512VL));
> > >  
> > >  	kvm_cpu_cap_mask(CPUID_7_ECX,
> > > -		F(AVX512VBMI) | F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
> > > +		F(AVX512VBMI) | RAW_F(LA57) | F(PKU) | 0 /*OSPKE*/ | F(RDPID) |
> > >  		F(AVX512_VPOPCNTDQ) | F(UMIP) | F(AVX512_VBMI2) | F(GFNI) |
> > >  		F(VAES) | F(VPCLMULQDQ) | F(AVX512_VNNI) | F(AVX512_BITALG) |
> > >  		F(CLDEMOTE) | F(MOVDIRI) | F(MOVDIR64B) | 0 /*WAITPKG*/ |
> > >  		F(SGX_LC) | F(BUS_LOCK_DETECT)
> > >  	);
> > > -	/* Set LA57 based on hardware capability. */
> > > -	if (cpuid_ecx(7) & F(LA57))
> > > -		kvm_cpu_cap_set(X86_FEATURE_LA57);
> > >  
> > >  	/*
> > >  	 * PKU not yet implemented for shadow paging and requires OSPKE
> > 
> > Putting a function call into a macro which evaluates into a bitmask is somewhat misleading,
> > but let it be...
> 
> And weird.  Rather than abuse kvm_cpu_cap_set(), what about adding another variable
> scoped to kvm_cpu_cap_init()?
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 0e64a6332052..b8bc8713a0ec 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -87,12 +87,10 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
>  /*
>   * Raw Feature - For features that KVM supports based purely on raw host CPUID,
>   * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
> - * Simply force set the feature in KVM's capabilities, raw CPUID support will
> - * be factored in by __kvm_cpu_cap_mask().
>   */
>  #define RAW_F(name)                                            \
>  ({                                                             \
> -       kvm_cpu_cap_set(X86_FEATURE_##name);                    \
> +       kvm_cpu_cap_passthrough |= F(name);                     \
>         F(name);                                                \
>  })
>  
> @@ -737,6 +735,7 @@ do {                                                                        \
>  do {                                                                   \
>         const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);    \
>         const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;   \
> +       u32 kvm_cpu_cap_passthrough = 0;                                \
>         u32 kvm_cpu_cap_emulated = 0;                                   \
>         u32 kvm_cpu_cap_synthesized = 0;                                \
>                                                                         \
> @@ -745,6 +744,7 @@ do {                                                                        \
>         else                                                            \
>                 kvm_cpu_caps[leaf] = (mask);                            \
>                                                                         \
> +       kvm_cpu_caps[leaf] |= kvm_cpu_cap_passthrough;                  \
>         kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |                   \
>                                kvm_cpu_cap_synthesized);                \
>         kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;                     \
> 

I agree, this is better.

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features
  2024-07-08 21:08     ` Sean Christopherson
@ 2024-07-24 17:46       ` Maxim Levitsky
  2024-07-25 18:39         ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 17:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, 2024-07-08 at 14:08 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > > Add a macro to precisely handle CPUID features that AMD duplicated from
> > > CPUID.0x1.EDX into CPUID.0x8000_0001.EDX.  This will allow adding an
> > > assert that all features passed to kvm_cpu_cap_init() match the word being
> > > processed, e.g. to prevent passing a feature from CPUID 0x7 to CPUID 0x1.
> > > 
> > > Because the kernel simply reuses the X86_FEATURE_* definitions from
> > > CPUID.0x1.EDX, KVM's use of the aliased features would result in false
> > > positives from such an assert.
> > > 
> > > No functional change intended.
> > > 
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  arch/x86/kvm/cpuid.c | 24 +++++++++++++++++-------
> > >  1 file changed, 17 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index 5e3b97d06374..f2bd2f5c4ea3 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -88,6 +88,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
> > >  	F(name);						\
> > >  })
> > >  
> > > +/*
> > > + * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
> > > + * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
> > > + */
> > > +#define AF(name)								\
> > > +({										\
> > > +	BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);	\
> > > +	feature_bit(name);							\
> > > +})
> > > +
> > >  /*
> > >   * Magic value used by KVM when querying userspace-provided CPUID entries and
> > >   * doesn't care about the CPIUD index because the index of the function in
> > > @@ -758,13 +768,13 @@ void kvm_set_cpu_caps(void)
> > >  	);
> > >  
> > >  	kvm_cpu_cap_init(CPUID_8000_0001_EDX,
> > > -		F(FPU) | F(VME) | F(DE) | F(PSE) |
> > > -		F(TSC) | F(MSR) | F(PAE) | F(MCE) |
> > > -		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
> > > -		F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
> > > -		F(PAT) | F(PSE36) | 0 /* Reserved */ |
> > > -		F(NX) | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
> > > -		F(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
> > > +		AF(FPU) | AF(VME) | AF(DE) | AF(PSE) |
> > > +		AF(TSC) | AF(MSR) | AF(PAE) | AF(MCE) |
> > > +		AF(CX8) | AF(APIC) | 0 /* Reserved */ | F(SYSCALL) |
> > > +		AF(MTRR) | AF(PGE) | AF(MCA) | AF(CMOV) |
> > > +		AF(PAT) | AF(PSE36) | 0 /* Reserved */ |
> > > +		F(NX) | 0 /* Reserved */ | F(MMXEXT) | AF(MMX) |
> > > +		AF(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
> > >  		0 /* Reserved */ | X86_64_F(LM) | F(3DNOWEXT) | F(3DNOW)
> > >  	);
> > >  
> > 
> > Hi,
> > 
> > What if we defined the aliased features instead.
> > Something like this:
> > 
> > #define __X86_FEATURE_8000_0001_ALIAS(feature) \
> > 	(feature + (CPUID_8000_0001_EDX - CPUID_1_EDX) * 32)
> > 
> > #define KVM_X86_FEATURE_FPU_ALIAS	__X86_FEATURE_8000_0001_ALIAS(KVM_X86_FEATURE_FPU)
> > #define KVM_X86_FEATURE_VME_ALIAS	__X86_FEATURE_8000_0001_ALIAS(KVM_X86_FEATURE_VME)
> > 
> > And then just use for example the 'F(FPU_ALIAS)' in the CPUID_8000_0001_EDX
> 
> At first glance, I really liked this idea, but after working through the
> ramifications, I think I prefer "converting" the flag when passing it to
> kvm_cpu_cap_init().  In-place conversion makes it all but impossible for KVM to
> check the alias, e.g. via guest_cpu_cap_has(), especially since the AF() macro
> doesn't set the bits in kvm_known_cpu_caps (if/when a non-hacky validation of
> usage becomes reality).

Could you elaborate on this as well?

My suggestion was that we can just treat aliases as completely independent and dummy features,
say KVM_X86_FEATURE_FPU_ALIAS, and pass them as is to the guest, which means that
if an alias is present in host cpuid, it appears in kvm caps, and thus qemu can then
set it in guest cpuid.

I don't think that we need any special treatment for them if you look at it this way.
If you don't agree, can you give me an example?


> 
> Side topic, if it's not already documented somewhere else, kvm/x86/cpuid.rst
> should call out that KVM only honors the features in CPUID.0x1, i.e. that setting
> aliased bits in CPUID.0x8000_0001 is supported if and only if the bit(s) is also
> set in CPUID.0x1.

To be honest if KVM enforces this, such enforcement can be removed IMHO:

KVM already allows all kinds of totally invalid
CPUIDs to be set by the guest, for example a CPUID in which AVX3 is set, and AVX and/or XSAVE is not set.

So having a guest given cpuid where aliased feature is set, and regular feature is not set,
should not pose any problem to KVM itself, as long as KVM itself uses only the non-aliased
features as the ground truth.

Since such configuration is an error anyway, allowing it won't break any existing users IMHO.

What do you think about this? If you don't agree, can you provide an example of a breakage?


Best regards,
	Maxim Levitsky

> 



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 23/49] KVM: x86: Handle kernel- and KVM-defined CPUID words in a single helper
  2024-07-08 21:18     ` Sean Christopherson
  2024-07-17 14:00       ` Xiaoyao Li
@ 2024-07-24 17:51       ` Maxim Levitsky
  2024-07-25 19:18         ` Sean Christopherson
  1 sibling, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 17:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, 2024-07-08 at 14:18 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > Merge kvm_cpu_cap_init() and kvm_cpu_cap_init_kvm_defined() into a single
> > > helper.  The only advantage of separating the two was to make it somewhat
> > > obvious that KVM directly initializes the KVM-defined words, whereas using
> > > a common helper will allow for hardening both kernel- and KVM-defined
> > > CPUID words without needing copy+paste.
> > > 
> > > No functional change intended.
> > > 
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  arch/x86/kvm/cpuid.c | 44 +++++++++++++++-----------------------------
> > >  1 file changed, 15 insertions(+), 29 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index f2bd2f5c4ea3..8efffd48cdf1 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -622,37 +622,23 @@ static __always_inline u32 raw_cpuid_get(struct cpuid_reg cpuid)
> > >  	return *__cpuid_entry_get_reg(&entry, cpuid.reg);
> > >  }
> > >  
> > > -/* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */
> > > -static __always_inline void __kvm_cpu_cap_mask(unsigned int leaf)
> > > +static __always_inline void kvm_cpu_cap_init(u32 leaf, u32 mask)
> > >  {
> > >  	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
> > >  
> > > -	reverse_cpuid_check(leaf);
> > > +	/*
> > > +	 * For kernel-defined leafs, mask the boot CPU's pre-populated value.
> > > +	 * For KVM-defined leafs, explicitly set the leaf, as KVM is the one
> > > +	 * and only authority.
> > > +	 */
> > > +	if (leaf < NCAPINTS)
> > > +		kvm_cpu_caps[leaf] &= mask;
> > > +	else
> > > +		kvm_cpu_caps[leaf] = mask;
> > 
> > Hi,
> > 
> > I have an idea, how about we just initialize the kvm only leafs to 0xFFFFFFFF
> > and then treat them exactly in the same way as kernel regular leafs?
> > 
> > Then the user won't have to figure out (assuming that the user doesn't read
> > the comment, who does?) why we use mask as init value.
> > 
> > But if you prefer to leave it this way, I won't object either.
> 
> Huh, hadn't thought of that.  It's a small code change, but I'm leaning towards
> keeping the current code as we'd still need a comment to explain why KVM sets
> all bits by default. 

I agree that the comment is needed, but the comment in my case is more natural - 
KVM only leaves don't come from boot_cpu_info, so naturally all features there are '1'.


>  And in the unlikely case that we royally screw up and fail
> to call kvm_cpu_cap_init() on a word, starting with 0xff would result in all
> features in the uninitialized word being treated as supported.
Yes, but IMHO the chances of this happening are very low.

I understand your concerns though, but then IMHO it's better to keep the
kvm_cpu_cap_init_kvm_defined, because this way at least the function name
cleanly describes the difference instead of the difference being buried in the function
itself (the comment helps but still it is less noticeable than a function name). 

I don't have a very strong opinion on this though, 
because IMHO the kvm_cpu_cap_init_kvm_defined is also not very user friendly, 
so if you really think that the new code is more readable, let it be.

Best regards,
	Maxim Levitsky


> 
> For posterity...
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 18ded0e682f2..6fcfb0fa4bd6 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -762,11 +762,7 @@ do {                                                                       \
>         u32 kvm_cpu_cap_emulated = 0;                                   \
>         u32 kvm_cpu_cap_synthesized = 0;                                \
>                                                                         \
> -       if (leaf < NCAPINTS)                                            \
> -               kvm_cpu_caps[leaf] &= (mask);                           \
> -       else                                                            \
> -               kvm_cpu_caps[leaf] = (mask);                            \
> -                                                                       \
> +       kvm_cpu_caps[leaf] &= (mask);                                   \
>         kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |                   \
>                                kvm_cpu_cap_synthesized);                \
>         kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;                     \
> @@ -780,7 +776,7 @@ do {                                                                        \
>  
>  void kvm_set_cpu_caps(void)
>  {
> -       memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));
> +       memset(kvm_cpu_caps, 0xff, sizeof(kvm_cpu_caps));
>  
>         BUILD_BUG_ON(sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)) >
>                      sizeof(boot_cpu_data.x86_capability));
> 



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 24/49] KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions
  2024-07-08 21:29     ` Sean Christopherson
@ 2024-07-24 17:54       ` Maxim Levitsky
  2024-07-26 23:34         ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 17:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, 2024-07-08 at 14:29 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > Undefine SPEC_CTRL_SSBD, which is #defined by msr-index.h to represent the
> > > enable flag in MSR_IA32_SPEC_CTRL, to avoid issues with the macro being
> > > unpacked into its raw value when passed to KVM's F() macro.  This will
> > > allow using multiple layers of macros in F() and friends, e.g. to harden
> > > against incorrect usage of F().
> > > 
> > > No functional change intended (cpuid.c doesn't consume SPEC_CTRL_SSBD).
> > > 
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  arch/x86/kvm/cpuid.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index 8efffd48cdf1..a16d6e070c11 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -639,6 +639,12 @@ static __always_inline void kvm_cpu_cap_init(u32 leaf, u32 mask)
> > >  	kvm_cpu_caps[leaf] &= raw_cpuid_get(cpuid);
> > >  }
> > >  
> > > +/*
> > > + * Undefine the MSR bit macro to avoid token concatenation issues when
> > > + * processing X86_FEATURE_SPEC_CTRL_SSBD.
> > > + */
> > > +#undef SPEC_CTRL_SSBD
> > > +
> > >  void kvm_set_cpu_caps(void)
> > >  {
> > >  	memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));
> > 
> > Hi,
> > 
> > Maybe we should instead rename the SPEC_CTRL_SSBD to
> > 'MSR_IA32_SPEC_CTRL_SSBD' and together with it, other fields of this msr.  It
> > seems that at least some msrs in this file do this.
> 
> Yeah, the #undef hack is quite ugly.  But I didn't (and still don't) want to
> introduce all the renaming churn in the middle of this already too-big series,
> especially since it would require touching quite a bit of code outside of KVM.



> 
> I'm also not sure that's the right thing to do; I kinda feel like KVM is the one
> that's being silly here.

I don't think that KVM is silly here. I think that hardware definitions like
MSRs, register names, register bit fields, etc, *must* come with a unique prefix,
it's not an issue of breaking some deeply nested macro, but rather an issue of readability.

SPEC_CTRL_SSBD for example won't mean much to someone who only knows ARM, while
MSR_SPEC_CTRL_SSBD, or even better IA32_MSR_SPEC_CTRL_SSBD, lets you instantly know
that this is a MSR, and anyone with even a bit of x86 knowledge should at least have
heard about what a MSR is.

In regard to X86_FEATURE_INTEL_SSBD, I don't oppose this idea, because we have
X86_FEATURE_AMD_SSBD, but in general I do oppose the idea of adding 'INTEL' prefix,
because it sets a not that good precedent, because most of the features on x86
are first done by Intel, but then are also implemented by AMD, and thus an intel-only
feature name can stick after it becomes a general x86 feature.

IN case of X86_FEATURE_INTEL_SSBD, we already have sadly different CPUID bits for
each vendor (although I wonder if AMD also sets the X86_FEATURE_INTEL_SSBD).

I vote to rename 'SPEC_CTRL_SSBD', it can be done as a standalone patch, and can
be accepted right now, even before this patch series is accepted.

Best regards,
	Maxim Levitsky


> 
> Aha!  Rather than rename the MSR bits, what if we rename the X86_FEATURE flag,
> e.g. to X86_FEATURE_INTEL_SPEC_CTRL_SSBD, X86_FEATURE_MSR_SPEC_CTRL_SSBD, or maybe
> even just X86_FEATURE_INTEL_SSBD.  Much less churn, and it would add even more
> clarity as to why there's also X86_FEATURE_SSBD and X86_FEATURE_AMD_SSBD.
> 
> I'll post a standalone patch to make that change, and maybe see if I can take it
> through the KVM tree.
> 




^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 25/49] KVM: x86: Harden CPU capabilities processing against out-of-scope features
  2024-07-09 18:11     ` Sean Christopherson
@ 2024-07-24 17:55       ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 17:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Tue, 2024-07-09 at 11:11 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > +/*
> > > + * For kernel-defined leafs, mask the boot CPU's pre-populated value.  For KVM-
> > > + * defined leafs, explicitly set the leaf, as KVM is the one and only authority.
> > > + */
> > > +#define kvm_cpu_cap_init(leaf, mask)					\
> > > +do {									\
> > > +	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
> > > +	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
> > 
> > Why not to #define the kvm_cpu_cap_init_in_progress as well instead of a variable?
> 
> Macros can't #define new macros.  A macro could be used, but it would require the
> caller to #define and #undef the macro, e.g.

Oh, I somehow forgot about this, of course this is how C processor works.


> 	#define kvm_cpu_cap_init_in_progress CPUID_1_ECX
> 	kvm_cpu_cap_init(CPUID_1_ECX, ...)
> 	#undef kvm_cpu_cap_init_in_progress
> 
Yes, this is much uglier.

> but, stating the obvious, that's ugly and is less robust than automatically
> "defining" the in-progress leaf in kvm_cpu_cap_init().
> 

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software
  2024-07-08 22:30     ` Sean Christopherson
@ 2024-07-24 17:58       ` Maxim Levitsky
  2024-07-27  0:06         ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 17:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, 2024-07-08 at 15:30 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > Now that kvm_cpu_cap_init() is a macro with its own scope, add EMUL_F() to
> > > OR-in features that KVM emulates in software, i.e. that don't depend on
> > > the feature being available in hardware.  The contained scope
> > > of kvm_cpu_cap_init() allows using a local variable to track the set of
> > > emulated leaves, which in addition to avoiding confusing and/or
> > > unnecessary variables, helps prevent misuse of EMUL_F().
> > > 
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  arch/x86/kvm/cpuid.c | 36 +++++++++++++++++++++---------------
> > >  1 file changed, 21 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index 1064e4d68718..33e3e77de1b7 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -94,6 +94,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
> > >  	F(name);						\
> > >  })
> > >  
> > > +/*
> > > + * Emulated Feature - For features that KVM emulates in software irrespective
> > > + * of host CPU/kernel support.
> > > + */
> > > +#define EMUL_F(name)						\
> > > +({								\
> > > +	kvm_cpu_cap_emulated |= F(name);			\
> > > +	F(name);						\
> > > +})
> > 
> > To me it feels more and more that this patch series doesn't go into the right
> > direction.
> > 
> > How about we just abandon the whole concept of masks and instead just have a
> > list of statements
> > 
> > Pretty much the opposite of the patch series I confess:
> 
> FWIW, I think it's actually largely the same code under the hood.  The code for
> each concept/flavor ends up being very similar, it's mostly just handling the
> bitwise-OR in the callers vs. in the helpers.
> 
> > #define CAP_PASSTHOUGH		0x01
> > #define CAP_EMULATED		0x02
> > #define CAP_AMD_ALIASED		0x04 // for AMD aliased features
> > #define CAP_SCATTERED		0x08
> > #define CAP_X86_64		0x10 // supported only on 64 bit hypervisors
> > ...
> > 
> > 
> > /* CPUID_1_ECX*/
> > 
> > 				/* TMA is not passed though because: xyz*/
> > kvm_cpu_cap_init(TMA,           0);
> > 
> > kvm_cpu_cap_init(SSSE3,         CAP_PASSTHOUGH);
> > 				/* CNXT_ID is not passed though because: xyz*/
> > kvm_cpu_cap_init(CNXT_ID,       0);
> > kvm_cpu_cap_init(RESERVED,      0);
> > kvm_cpu_cap_init(FMA,           CAP_PASSTHOUGH);
> > ...
> > 				/* KVM always emulates TSC_ADJUST */
> > kvm_cpu_cap_init(TSC_ADJUST,    CAP_EMULATED | CAP_SCATTERED);
> > 
> > ...
> > 
> > /* CPUID_D_1_EAX*/
> > 				/* XFD is disabled on 32 bit systems because: xyz*/
> > kvm_cpu_cap_init(XFD, 		CAP_PASSTHOUGH | CAP_X86_64)
> > 
> > 
> > 'kvm_cpu_cap_init' can be a macro if needed to have the compile checks.
> > 
> > There are several advantages to this:
> > 
> > - more readability, plus if needed each statement can be amended with a comment.
> > - No weird hacks in 'F*' macros, which additionally eventually evaluate into a bit,
> >   which is confusing.
> >   In fact no need to even have them at all.
> > - No need to verify that bitmask belongs to a feature word.
> 
> Yes, but the downside is that there is no enforcement of features in a word being
> bundled together.

As I explained earlier, this is not an issue in principle, even if the caps are not
grouped together, the code will still work just fine.


kvm_cpu_cap_init_begin(CPUID_1_ECX);
                                /* TMA is not passed though because: xyz*/
kvm_cpu_cap_init(TMA,           0);
kvm_cpu_cap_init(SSSE3,         CAP_PASSTHOUGH);
                                /* CNXT_ID is not passed though because: xyz*/
kvm_cpu_cap_init(CNXT_ID,       0);
kvm_cpu_cap_init(RESERVED,      0);
kvm_cpu_cap_init(FMA,           CAP_PASSTHOUGH);
...
                                /* KVM always emulates TSC_ADJUST */
kvm_cpu_cap_init(TSC_ADJUST,    CAP_EMULATED | CAP_SCATTERED);

kvm_cpu_cap_init_end(CPUID_1_ECX);

...

...

And kvm_cpu_cap_init_begin, can set some cap_in_progress variable.



> 
> > - Merge friendly - each capability has its own line.
> 
> That's almost entirely convention though.  Other than inertia, nothing is stopping
> us from doing:
> 
> 	kvm_cpu_cap_init(CPUID_12_EAX,
> 		SF(SGX1) |
> 		SF(SGX2) |
> 		SF(SGX_EDECCSSA)

That trivial change is already an improvement, although it still leaves the problem
of thinking that this is one bit 'or', which was reasonable before this patch series,
because it was indeed one big 'or' but now there is lots of things going on behind
the scenes and that violates the principle of the least surprise.

My suggestion fixes this, because when the user sees a series of function calls,
and nobody will assume anything about these functions calls in contrast with series
of 'ors'. It's just how I look at it.

> 	);
> 
> I don't see a clean way of avoiding the addition of " |" on the last existing
> line, but in practice I highly doubt that will ever be a source of meaningful pain.
> 
> Same goes for the point about adding comments.  We could do that with either
> approach, we just don't do so today.

Yes, from the syntax POV there is indeed no problem, and I do agree that putting
each feature on its own line, together with comments for the features that need it
is a win-win improvement over what we have after this patch series.

> 
> > Disadvantages:
> > 
> > - Longer list - IMHO not a problem, since it is very easy to read / search
> >   and can have as much comments as needed.
> >   For example this is how the kernel lists the CPUID features and this list IMHO
> >   is very manageable.
> 
> There's one big difference: KVM would need to have a line for every feature that
> KVM _doesn't_ support.

Could you elaborate on why?
If we zero the whole leaf and then set specific bits there, one bit per kvm_cpu_cap_init.



>   For densely populated words, that's not a huge issue,
> but it's problematic for sparsely populated words, e.g. CPUID_12_EAX would have
> 29 reserved/unsupport entries, which IMO ends up being a big net negative for
> code readability and ongoing maintenance.
> 
> We could avoid that cost (and the danger of a missed bit) by collecting the set
> of features that have been initialized for each word, and then masking off the
> uninitialized/unsupported at the end.  But then we're back to the bitwise-OR and
> mask logic.
> 
> And while I agree that having the F*() macros set state _and_ evaulate to a bit
> is imperfect, it does have its advantages.  E.g. to avoid evaluating to a value,
> we could have F() modify a local variable that is scoped to kvm_cpu_cap_init(),
> a las kvm_cpu_cap_emulated.  But then we'd need explicit code and/or comments
> to call out that VMM_F() and the like intentionally don't set kvm_cpu_cap_supported,
> whereas evualating to a value is a relatively self-documenting "0;".
> 
> > - Slower - kvm_set_cpu_caps is called exactly once per KVM module load, thus
> >   performance is the last thing I would care about in this function.
> > 
> > Another note about this patch: It is somewhat confusing that EMUL_F just
> > forces a feature in kvm caps, regardless of CPU support, because KVM also has
> > KVM_GET_EMULATED_CPUID and it has a different meaning.
> 
> Yeah, but IMO that's a problem with KVM_GET_EMULATED_CPUID being poorly defined.
> 
> > Users can easily confuse the EMUL_F for something that sets a feature bit in
> > the KVM_GET_EMULATED_CPUID.
> 
> I'll see if I can find a good spot for a comment to try and convenient


Best regards,
	Maxim Levitsky
> 



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 29/49] KVM: x86: Remove unnecessary caching of KVM's PV CPUID base
  2024-07-09 19:00     ` Sean Christopherson
@ 2024-07-24 17:59       ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 17:59 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Tue, 2024-07-09 at 12:00 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > Now that KVM only searches for KVM's PV CPUID base when userspace sets
> > > guest CPUID, drop the cache and simply do the search every time.
> > > 
> > > Practically speaking, this is a nop except for situations where userspace
> > > sets CPUID _after_ running the vCPU, which is anything but a hot path,
> > > e.g. QEMU does so only when hotplugging a vCPU.  And on the flip side,
> > > caching guest CPUID information, especially information that is used to
> > > query/modify _other_ CPUID state, is inherently dangerous as it's all too
> > > easy to use stale information, i.e. KVM should only cache CPUID state when
> > > the performance and/or programming benefits justify it.
> > > 
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> 
> ...
> 
> > > @@ -491,13 +479,6 @@ static int kvm_set_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid_entry2 *e2,
> > >  	 * whether the supplied CPUID data is equal to what's already set.
> > >  	 */
> > >  	if (kvm_vcpu_has_run(vcpu)) {
> > > -		/*
> > > -		 * Note, runtime CPUID updates may consume other CPUID-driven
> > > -		 * vCPU state, e.g. KVM or Xen CPUID bases.  Updating runtime
> > > -		 * state before full CPUID processing is functionally correct
> > > -		 * only because any change in CPUID is disallowed, i.e. using
> > > -		 * stale data is ok because KVM will reject the change.
> > > -		 */
> > Hi,
> > 
> > Any reason why this comment was removed?
> 
> Because after this patch, runtime CPUID updates no longer consume other vCPU
> state that is derived from guest CPUID.
> 
> > As I said earlier in the review.  It might make sense to replace this comment
> > with a comment reflecting on why we need to call kvm_update_cpuid_runtime,
> > that is solely to allow old == new compare to succeed.
> 
> Ya, I'll figure out a location and patch to document why KVM applies runtime
> and quirks to the CPUID before checking.
> 
> > >  		kvm_update_cpuid_runtime(vcpu);
> > >  		kvm_apply_cpuid_pv_features_quirk(vcpu);


Makes sense, thanks!

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 33/49] KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID
  2024-07-09 19:28     ` Sean Christopherson
@ 2024-07-24 18:00       ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 18:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Tue, 2024-07-09 at 12:28 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > >  4.47 KVM_PPC_GET_PVINFO
> > >  -----------------------
> > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > index 699ce4261e9c..d1f427284ccc 100644
> > > --- a/arch/x86/kvm/cpuid.c
> > > +++ b/arch/x86/kvm/cpuid.c
> > > @@ -680,8 +680,8 @@ void kvm_set_cpu_caps(void)
> > >  		F(FMA) | F(CX16) | 0 /* xTPR Update */ | F(PDCM) |
> > >  		F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
> > >  		F(XMM4_2) | EMUL_F(X2APIC) | F(MOVBE) | F(POPCNT) |
> > > -		0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
> > > -		F(F16C) | F(RDRAND)
> > > +		EMUL_F(TSC_DEADLINE_TIMER) | F(AES) | F(XSAVE) |
> > > +		0 /* OSXSAVE */ | F(AVX) | F(F16C) | F(RDRAND)
> > >  	);
> > >  
> > >  	kvm_cpu_cap_init(CPUID_1_EDX,
> > 
> > Hi,
> > 
> > I have a mixed feeling about this.
> > 
> > First of all KVM_GET_SUPPORTED_CPUID documentation explicitly states that it
> > returns bits that are supported in *default* configuration TSC_DEADLINE_TIMER
> > and arguably X2APIC are only supported after enabling various caps, e.g not
> > default configuration.
> 
> Another side topic, in the near future, I think we should push to make an in-kernel
> local APIC a hard requirement. 

I vote yes, with my both hands for this, but I am sure that this will for sure break at least some
userspace and/or some misconfigured qemu instances.

>  AFAIK, userspace local APIC gets no meaningful
> test coverage, and IIRC we have known bugs where a userspace APIC doesn't work
> as it should, e.g. commit 6550c4df7e50 ("KVM: nVMX: Fix interrupt window request
> with "Acknowledge interrupt on exit"").
> 
> > However, since X2APIC also in KVM_GET_SUPPORTED_CPUID (also wrongly IMHO),
> > for consistency it does make sense to add TSC_DEADLINE_TIMER as well.
> > 
> > I do think that we need at least to update the documentation of KVM_GET_SUPPORTED_CPUID
> > and KVM_GET_EMULATED_CPUID, as I state in a review of a later patch.
> 
> +1
> 


Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 37/49] KVM: x86: Replace guts of "governed" features with comprehensive cpu_caps
  2024-07-09 18:30     ` Sean Christopherson
@ 2024-07-24 18:00       ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 18:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Tue, 2024-07-09 at 11:30 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > @@ -861,23 +877,20 @@ struct kvm_vcpu_arch {
> > >  	bool is_amd_compatible;
> > >  
> > >  	/*
> > > -	 * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly
> > > -	 * when "struct kvm_vcpu_arch" is no longer defined in an
> > > -	 * arch/x86/include/asm header.  The max is mostly arbitrary, i.e.
> > > -	 * can be increased as necessary.
> > > +	 * cpu_caps holds the effective guest capabilities, i.e. the features
> > > +	 * the vCPU is allowed to use.  Typically, but not always, features can
> > > +	 * be used by the guest if and only if both KVM and userspace want to
> > > +	 * expose the feature to the guest.
> > 
> > Nitpick: Since even the comment mentions this, wouldn't it be better to call this
> > cpu_effective_caps? or at least cpu_eff_caps, to emphasize that these are indeed
> > effective capabilities, e.g these that both kvm and userspace support?
> 
> I strongly prefer cpu_caps, in part to match kvm_cpu_caps, but also because adding
> "effective" to the name incorrectly suggests that there are other guest capabilities
> that aren't effective.  These are the _only_ per-vCPU capabilities as far as KVM
> is concerned, i.e. they are the single source of truth.  kvm_cpu_caps holds KVM's
> capabilities, boot_cpu_data holds kernel capabilities, and bare metal holds its
> capabilities somewhere in silicion.
Looking from this POV, it make sense.
> 
> E.g. being pedantic, kvm_cpu_caps are also KVM's effective capabilities, as they
> are a reflection of KVM-the-module's capabilities, module params, kernel capabilities,
> and CPU capabilities.
> 

Let it be then,
Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 39/49] KVM: x86: Extract code for generating per-entry emulated CPUID information
  2024-07-09  0:13     ` Sean Christopherson
@ 2024-07-24 18:00       ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 18:00 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, 2024-07-08 at 17:13 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > PS: I spoke with Paolo about the meaning of KVM_GET_EMULATED_CPUID, because
> > it is not clear from the documentation what it does, or what it supposed to
> > do because qemu doesn't use this IOCTL.
> > 
> > So this ioctl is meant to return a static list of CPU features which *can* be
> > emulated by KVM, if the cpu doesn't support them, but there is a cost to it,
> > so they should not be enabled by default.
> > 
> > This means that if you run 'qemu -cpu host', these features (like rdpid) will
> > only be enabled if supported by the host cpu, however if you explicitly ask
> > qemu for such a feature, like 'qemu -cpu host,+rdpid', qemu should not warn
> > if the feature is not supported on host cpu but can be emulated (because kvm
> > can emulate the feature, which is stated by KVM_GET_EMULATED_CPUID ioctl).
> > 
> > Qemu currently doesn't support this but the support can be added.
> > 
> > So I think that the two ioctls should be redefined as such:
> > 
> > KVM_GET_SUPPORTED_CPUID - returns all CPU features that are supported by KVM,
> > supported by host hardware, or that KVM can efficiently emulate.
> > 
> > 
> > KVM_GET_EMULATED_CPUID - returns all CPU features that KVM *can* emulate if
> > the host cpu lacks support, but emulation is not efficient and thus these
> > features should be used with care when not supported by the host (e.g only
> > when the user explicitly asks for them).
> 
> Yep, that aligns with how I view the ioctls (I haven't read the documentaion,
> mainly because I have a terrible habit of never reading docs).
> 
> > I can post a patch to fix this or you can add something like that to your
> > patch series if you prefer.
> 
> Go ahead and post a patch, assuming it's just a documentation update.
> 
OK, will do.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 40/49] KVM: x86: Initialize guest cpu_caps based on KVM support
  2024-07-09  0:10     ` Sean Christopherson
@ 2024-07-24 18:01       ` Maxim Levitsky
  2024-07-29 15:34         ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 18:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, 2024-07-08 at 17:10 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > @@ -421,6 +423,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > >  	 */
> > >  	for (i = 0; i < NR_KVM_CPU_CAPS; i++) {
> > >  		const struct cpuid_reg cpuid = reverse_cpuid[i];
> > > +		struct kvm_cpuid_entry2 emulated;
> > >  
> > >  		if (!cpuid.function)
> > >  			continue;
> > > @@ -429,7 +432,16 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > >  		if (!entry)
> > >  			continue;
> > >  
> > > -		vcpu->arch.cpu_caps[i] = cpuid_get_reg_unsafe(entry, cpuid.reg);
> > > +		cpuid_func_emulated(&emulated, cpuid.function);
> > > +
> > > +		/*
> > > +		 * A vCPU has a feature if it's supported by KVM and is enabled
> > > +		 * in guest CPUID.  Note, this includes features that are
> > > +		 * supported by KVM but aren't advertised to userspace!
> > > +		 */
> > > +		vcpu->arch.cpu_caps[i] = kvm_cpu_caps[i] | kvm_vmm_cpu_caps[i] |
> > > +					 cpuid_get_reg_unsafe(&emulated, cpuid.reg);
> > > +		vcpu->arch.cpu_caps[i] &= cpuid_get_reg_unsafe(entry, cpuid.reg);
> > 
> > Hi,
> > 
> > I have an idea. What if we get rid of kvm_vmm_cpu_caps, and instead advertise the
> > MWAIT in KVM_GET_EMULATED_CPUID?
> > 
> > MWAIT is sort of emulated as NOP after all, plus features in KVM_GET_EMULATED_CPUID are
> > sort of 'emulated inefficiently' and you can say that NOP is an inefficient emulation
> > of MWAIT sort of.
> 
> Heh, sort of indeed.  I really don't want to advertise MWAIT to userspace in any
> capacity beyond KVM_CAP_X86_DISABLE_EXITS, because advertising MWAIT to VMs when
> MONITOR/MWAIT exiting is enabled is actively harmful, to both host and guest.

Assuming that the only purpose of the KVM_GET_EMULATED_CPUID is to allow the guest
to use a feature if it really insists, there should be no harm, but yes, I understand
your concert here.

> 
> KVM also doesn't emulate them on #UD, unlike MOVBE, which would make the API even
> more confusing than it already is.
This is even bigger justification for not doing this.


> 
> > It just feels to me that kvm_vmm_cpu_caps, is somewhat an overkill, and its name is
> > somewhat confusing.
> 
> Yeah, I don't love it either, but trying to handle MWAIT as a one-off was even
> uglier.  One option would be to piggyback cpuid_func_emulated(), but add a param
> to have it fill MWAIT only for KVM's internal purposes.  That'd essentially be
> the same as a one-off in kvm_vcpu_after_set_cpuid(), but less ugly.
> 
> I'd say it comes down to whether or not we expect to have more features that KVM
> "supports", but doesn't advertise to userspace.  If we do, then I think adding
> VMM_F() is the way to go.  If we expect MWAIT to be the only feature that gets
> this treatment, then I'm ok if we bastardize cpuid_func_emulated().
> 
> And I think/hope that MWAIT will be a one-off.  Emulating it as a nop was a
> mistake and has since been quirked, and I like to think we (eventually) learn
> from our mistakes.
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 0e64a6332052..dbc3f6ce9203 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -448,7 +448,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
>                 if (!entry)
>                         continue;
>  
> -               cpuid_func_emulated(&emulated, cpuid.function);
> +               cpuid_func_emulated(&emulated, cpuid.function, false);
>  
>                 /*
>                  * A vCPU has a feature if it's supported by KVM and is enabled
> @@ -1034,7 +1034,8 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
>         return entry;
>  }
>  
> -static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func)
> +static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func,
> +                              bool only_advertised)

I'll say, lets call this boolean, 'include_partially_emulated', 
(basically features that kvm emulates but only partially,
and thus doesn't advertise, aka mwait)

and then it doesn't look that bad, assuming that comes with a comment.




>  {
>         memset(entry, 0, sizeof(*entry));
>  
> @@ -1048,6 +1049,9 @@ static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func)
>                 return 1;
>         case 1:
>                 entry->ecx = F(MOVBE);
> +               /* comment goes here. */
> +               if (!only_advertised)

And here 

	if(include_partially_emulated) ...


It sort of even self-documents nature of mwait emulation.

> +                       entry->ecx |= F(MWAIT);
>                 return 1;
>         case 7:
>                 entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
> @@ -1065,7 +1069,7 @@ static int __do_cpuid_func_emulated(struct kvm_cpuid_array *array, u32 func)
>         if (array->nent >= array->maxnent)
>                 return -E2BIG;
>  
> -       array->nent += cpuid_func_emulated(&array->entries[array->nent], func);
> +       array->nent += cpuid_func_emulated(&array->entries[array->nent], func, true);
>         return 0;
>  }
> 

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 46/49] KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_caps
  2024-07-09 19:20     ` Sean Christopherson
@ 2024-07-24 18:01       ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 18:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Tue, 2024-07-09 at 12:20 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > +static __always_inline bool guest_cpuid_has(struct kvm_vcpu *vcpu,
> > > +					    unsigned int x86_feature)
> > >  {
> > >  	const struct cpuid_reg cpuid = x86_feature_cpuid(x86_feature);
> > >  	struct kvm_cpuid_entry2 *entry;
> > > +	u32 *reg;
> > > +
> > > +	/*
> > > +	 * XSAVES is a special snowflake.  Due to lack of a dedicated intercept
> > > +	 * on SVM, KVM must assume that XSAVES (and thus XRSTORS) is usable by
> > > +	 * the guest if the host supports XSAVES and *XSAVE* is exposed to the
> > > +	 * guest.  Although the guest can read/write XSS via XSAVES/XRSTORS, to
> > > +	 * minimize the virtualization hole, KVM rejects attempts to read/write
> > > +	 * XSS via RDMSR/WRMSR.  To make that work, KVM needs to check the raw
> > > +	 * guest CPUID, not KVM's view of guest capabilities.
> > 
> > Hi,
> > 
> > I think that this comment is wrong:
> > 
> > The guest can't read/write XSS via XSAVES/XRSTORS. It can only use XSAVES/XRSTORS
> > to save/restore features that are enabled in XSS, and thus if there are none enabled,
> > the XSAVES/XRSTORS acts as more or less XSAVEOPTC/XRSTOR except working only when CPL=0)
> 
> Doh, right you are.
> 
> > So I don't think that there is a virtualization hole except the fact that VMM can't
> > really disable XSAVES if it chooses to.
> 
> There is still a hole.  If XSAVES is not supported, KVM runs the guest with the
> host XSS.  See the conditional switching in kvm_load_{guest,host}_xsave_state().
> Not treating XSAVES as being available to the guest would allow the guest to read
> and write host supervisor state.
Makes sense. The remaining virtualization hole is indeed that we can't disable XSAVES,
even if userspace chooses to, we still can't.


> 
> I'll rewrite the comment to call that.
> 
> > Another "half virtualization hole" is that since we have chosen to not
> > intercept XSAVES at all, (AMD can't do this at all, and it's slow anyway) we
> > instead opted to never support some XSS bits (so far all of them, only
> > upcoming CET will add a few supported bits).
> > 
> > This creates an unexpected situation for the guest - enabled feature (e.g PT)
> > but no XSS bit supported to context switch it. x86 arch does allow this
> > though.


Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 47/49] KVM: x86: Drop superfluous host XSAVE check when adjusting guest XSAVES caps
  2024-07-09 19:15     ` Sean Christopherson
@ 2024-07-24 18:02       ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 18:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Tue, 2024-07-09 at 12:15 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > Drop the manual boot_cpu_has() checks on XSAVE when adjusting the guest's
> > > XSAVES capabilities now that guest cpu_caps incorporates KVM's support.
> > > The guest's cpu_caps are initialized from kvm_cpu_caps, which are in turn
> > > initialized from boot_cpu_data, i.e. checking guest_cpu_cap_has() also
> > > checks host/KVM capabilities (which is the entire point of cpu_caps).
> > > 
> > > Cc: Maxim Levitsky <mlevitsk@redhat.com>
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> > >  arch/x86/kvm/svm/svm.c | 1 -
> > >  arch/x86/kvm/vmx/vmx.c | 3 +--
> > >  2 files changed, 1 insertion(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index 06770b60c0ba..4aaffbf22531 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -4340,7 +4340,6 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > >  	 * the guest read/write access to the host's XSS.
> > >  	 */
> > >  	guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES,
> > > -			     boot_cpu_has(X86_FEATURE_XSAVE) &&
> > >  			     boot_cpu_has(X86_FEATURE_XSAVES) &&
> > >  			     guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE));
> > >  
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 741961a1edcc..6fbdf520c58b 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -7833,8 +7833,7 @@ void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > >  	 * to the guest.  XSAVES depends on CR4.OSXSAVE, and CR4.OSXSAVE can be
> > >  	 * set if and only if XSAVE is supported.
> > >  	 */
> > > -	if (!boot_cpu_has(X86_FEATURE_XSAVE) ||
> > > -	    !guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE))
> > > +	if (!guest_cpu_cap_has(vcpu, X86_FEATURE_XSAVE))
> > >  		guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES);
> > 
> > Hi,
> > 
> > I have a question about this code, even before the patch was applied:
> > 
> > While it is obviously correct to disable XSAVES when XSAVE not supported, I
> > wonder: There are a lot more cases like that and KVM explicitly doesn't
> > bother checking them, e.g all of the AVX family also depends on XSAVE due to
> > XCR0.
> > 
> > What makes XSAVES/XSAVE dependency special here? Maybe we can remove this
> > code to be consistent?
> 
> Because that would result in VMX and SVM behavior diverging with respect to
> whether guest_cpu_cap_has(X86_FEATURE_XSAVES).  E.g. for AMD it would be 100%
> accurate, but for Intel it would be accurate if and only if XSAVE is supported.
This is a good justification, and IMHO it is worth a comment in the VMX code,
so that this question I had won't be raised again.


> In practice that isn't truly problematic, because checks on XSAVES from common
> code are gated on guest CR4.OSXSAVE=1, i.e. implicitly check XSAVE support.  But
> the potential danger of sublty divergent behavior between VMX and SVM isn't worth
> making AVX vs. XSAVES consistent within VMX, especially since VMX vs. SVM would
> still be inconsistent.
> 
> > AMD portion of this patch, on the other hand does makes sense, due to a lack
> > of a separate XSAVES intercept.
> 
> FWIW, AMD also needs precise tracking in order to passthrough XSS for SEV-ES.
Makes sense too.

> 

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 48/49] KVM: x86: Add a macro for features that are synthesized into boot_cpu_data
  2024-07-09 21:13     ` Sean Christopherson
@ 2024-07-24 18:04       ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-07-24 18:04 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Tue, 2024-07-09 at 14:13 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > Add yet another CPUID macro, this time for features that the host kernel
> > > synthesizes into boot_cpu_data, i.e. that the kernel force sets even in
> > > situations where the feature isn't reported by CPUID.  Thanks to the
> > > macro shenanigans of kvm_cpu_cap_init(), such features can now be handled
> > > in the core CPUID framework, i.e. don't need to be handled out-of-band and
> > > thus without as many guardrails.
> > > 
> > > Adding a dedicated macro also helps document what's going on, e.g. the
> > > calls to kvm_cpu_cap_check_and_set() are very confusing unless the reader
> > > knows exactly how kvm_cpu_cap_init() generates kvm_cpu_caps (and even
> > > then, it's far from obvious).
> > > 
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > ---
> 
> ...
> 
> > Now that you added the final F_* macro, let's list all of them:
> > 
> > #define F(name)							\
> > 
> > /* Scattered Flag - For features that are scattered by cpufeatures.h. */
> > #define SF(name)						\
> > 
> > /* Features that KVM supports only on 64-bit kernels. */
> > #define X86_64_F(name)						\
> > 
> > /*
> >  * Raw Feature - For features that KVM supports based purely on raw host CPUID,
> >  * i.e. that KVM virtualizes even if the host kernel doesn't use the feature.
> >  * Simply force set the feature in KVM's capabilities, raw CPUID support will
> >  * be factored in by __kvm_cpu_cap_mask().
> >  */
> > #define RAW_F(name)						\
> > 
> > /*
> >  * Emulated Feature - For features that KVM emulates in software irrespective
> >  * of host CPU/kernel support.
> >  */
> > #define EMUL_F(name)						\
> > 
> > /*
> >  * Synthesized Feature - For features that are synthesized into boot_cpu_data,
> >  * i.e. may not be present in the raw CPUID, but can still be advertised to
> >  * userspace.  Primarily used for mitigation related feature flags.
> >  */
> > #define SYN_F(name)						\
> > 
> > /*
> >  * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
> >  * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
> >  */
> > #define AF(name)								\
> > 
> > /*
> >  * VMM Features - For features that KVM "supports" in some capacity, i.e. that
> >  * KVM may query, but that are never advertised to userspace.  E.g. KVM allows
> >  * userspace to enumerate MONITOR+MWAIT support to the guest, but the MWAIT
> >  * feature flag is never advertised to userspace because MONITOR+MWAIT aren't
> >  * virtualized by hardware, can't be faithfully emulated in software (KVM
> >  * emulates them as NOPs), and allowing the guest to execute them natively
> >  * requires enabling a per-VM capability.
> >  */
> > #define VMM_F(name)								\
> > 
> > 
> > Honestly, I already somewhat lost in what each of those macros means even
> > when reading the comments, which might indicate that a future reader might
> > also have a hard time understanding those.
> > 
> > I now support even more the case of setting each feature bit in a separate
> > statement as I explained in an earlier patch.
> > 
> > What do you think?
> 
> I completely agree that there are an absurd number of flavors of features, but
> I don't see how using separate statement eliminates any of that complexity.  The
> complexity comes from the fact that KVM actually has that many different ways and
> combinations for advertising and enumerating CPUID-based features.
> 
> Ignoring for the moment that "vmm" and "aliased" could be avoided for any approach,
> if we go with statements, we'll still have
> 
>   kvm_cpu_cap_init{,passthrough,emulated,synthesized,aliased,vmm,only64}()
> 
> or if the flavor is an input/enum,
> 
>   enum kvm_cpuid_feature_type {
>   	NORMAL,
> 	PASSTHROUGH,
> 	EMULATED,
> 	SYNTHESIZED,
> 	ALIASED,
> 	VMM,
> 	ONLY_64,
>   }

It doesn't have to be like that - something more compact can be done,
plus bitmask of various flags can be used.

> 
> I.e. we'll still need the same functionality and comments, it would simply be
> dressed up differently.

> 
> If the underlying concern is that the macro names are too terse, and/or getting
> one feature per line is desirable, 

I indeed have these concerns and more:

These are my concerns

1. Macro names are indeed too terse, and hard to figure out, even after looking
at the macro source.
This wasn't a problem before this patch series.

2. One feature per line would be very nice, it is much more readable, especially
when features have various 'modifiers'.
This wasn't such a problem before this patch series, because we just had features 'or'ed,
but having one feature per line would be a good thing to have even before this patch series.

3. Feature bitmap 'or'ing of macro's output after this patch series became very confusing, 
now that macros do various side things.

In fact VMM_F confuses the user even more, because it doesn't even contribute to the
feature mask at all.

It was OK before the patch series.

Technically of course I am not opposed to have the 'kvm_cpu_cap_init' or whatever we name
it, to remain a macro, it is probably even desirable to have it as a macro, but it is OK,
as long as it is just a macro which doesn't evaluate to anything and thus looks
like a function call.

Best regards,
	Maxim Levitsky


> then I'm definitely open to exploring alternative
> formatting options.  But that's largely orthogonal to using macros instead of
> individual function calls.
> 



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 11/49] KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after vCPU creation
  2024-07-24 17:31       ` Maxim Levitsky
@ 2024-07-25 18:07         ` Sean Christopherson
  0 siblings, 0 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-07-25 18:07 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> On Mon, 2024-07-08 at 19:43 +0000, Sean Christopherson wrote:
> > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > > > Reject KVM_CAP_X86_DISABLE_EXITS if vCPUs have been created, as disabling
> > > > PAUSE/MWAIT/HLT exits after vCPUs have been created is broken and useless,
> > > > e.g. except for PAUSE on SVM, the relevant intercepts aren't updated after
> > > > vCPU creation.  vCPUs may also end up with an inconsistent configuration
> > > > if exits are disabled between creation of multiple vCPUs.
> > > 
> > > Hi,
> > > 
> > > I am not sure that PAUSE intercepts are updated either, I wasn't able to find a code
> > > that does this.
> > > 
> > > I agree with this change, but note that there was some talk on the mailing
> > > list to allow to selectively disable VM exits (e.g PAUSE, MWAIT, ...) only on
> > > some vCPUs, based on the claim that some vCPUs might run RT tasks, while some
> > > might be housekeeping.  I haven't followed those discussions closely.
> > 
> > This change is actually pulled from that series[*].  IIRC, v1 of that series
> > didn't close the VM-scoped hole, and the overall code was much more complex as
> > a result.
> > 
> > [*] https://lore.kernel.org/all/20230121020738.2973-2-kechenl@nvidia.com
> > 
> 
> Hi,
> Thanks for the pointer, I searched for this patch series in many places but I
> couldn't find it.
> Any idea what happened with this patch series btw?

Nope.  IIRC, v6 was close to being ready and only had a few cosmetic issues, but
the author never posted a v7.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features
  2024-07-24 17:46       ` Maxim Levitsky
@ 2024-07-25 18:39         ` Sean Christopherson
  2024-08-05 11:06           ` mlevitsk
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-25 18:39 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> On Mon, 2024-07-08 at 14:08 -0700, Sean Christopherson wrote:
> > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > > > Add a macro to precisely handle CPUID features that AMD duplicated from
> > > > CPUID.0x1.EDX into CPUID.0x8000_0001.EDX.  This will allow adding an
> > > > assert that all features passed to kvm_cpu_cap_init() match the word being
> > > > processed, e.g. to prevent passing a feature from CPUID 0x7 to CPUID 0x1.
> > > > 
> > > > Because the kernel simply reuses the X86_FEATURE_* definitions from
> > > > CPUID.0x1.EDX, KVM's use of the aliased features would result in false
> > > > positives from such an assert.
> > > > 
> > > > No functional change intended.
> > > > 
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > > ---
> > > >  arch/x86/kvm/cpuid.c | 24 +++++++++++++++++-------
> > > >  1 file changed, 17 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > > index 5e3b97d06374..f2bd2f5c4ea3 100644
> > > > --- a/arch/x86/kvm/cpuid.c
> > > > +++ b/arch/x86/kvm/cpuid.c
> > > > @@ -88,6 +88,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
> > > >  	F(name);						\
> > > >  })
> > > >  
> > > > +/*
> > > > + * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
> > > > + * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
> > > > + */
> > > > +#define AF(name)								\
> > > > +({										\
> > > > +	BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);	\
> > > > +	feature_bit(name);							\
> > > > +})
> > > > +
> > > >  /*
> > > >   * Magic value used by KVM when querying userspace-provided CPUID entries and
> > > >   * doesn't care about the CPIUD index because the index of the function in
> > > > @@ -758,13 +768,13 @@ void kvm_set_cpu_caps(void)
> > > >  	);
> > > >  
> > > >  	kvm_cpu_cap_init(CPUID_8000_0001_EDX,
> > > > -		F(FPU) | F(VME) | F(DE) | F(PSE) |
> > > > -		F(TSC) | F(MSR) | F(PAE) | F(MCE) |
> > > > -		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
> > > > -		F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
> > > > -		F(PAT) | F(PSE36) | 0 /* Reserved */ |
> > > > -		F(NX) | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
> > > > -		F(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
> > > > +		AF(FPU) | AF(VME) | AF(DE) | AF(PSE) |
> > > > +		AF(TSC) | AF(MSR) | AF(PAE) | AF(MCE) |
> > > > +		AF(CX8) | AF(APIC) | 0 /* Reserved */ | F(SYSCALL) |
> > > > +		AF(MTRR) | AF(PGE) | AF(MCA) | AF(CMOV) |
> > > > +		AF(PAT) | AF(PSE36) | 0 /* Reserved */ |
> > > > +		F(NX) | 0 /* Reserved */ | F(MMXEXT) | AF(MMX) |
> > > > +		AF(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
> > > >  		0 /* Reserved */ | X86_64_F(LM) | F(3DNOWEXT) | F(3DNOW)
> > > >  	);
> > > >  
> > > 
> > > Hi,
> > > 
> > > What if we defined the aliased features instead.
> > > Something like this:
> > > 
> > > #define __X86_FEATURE_8000_0001_ALIAS(feature) \
> > > 	(feature + (CPUID_8000_0001_EDX - CPUID_1_EDX) * 32)
> > > 
> > > #define KVM_X86_FEATURE_FPU_ALIAS	__X86_FEATURE_8000_0001_ALIAS(KVM_X86_FEATURE_FPU)
> > > #define KVM_X86_FEATURE_VME_ALIAS	__X86_FEATURE_8000_0001_ALIAS(KVM_X86_FEATURE_VME)
> > > 
> > > And then just use for example the 'F(FPU_ALIAS)' in the CPUID_8000_0001_EDX
> > 
> > At first glance, I really liked this idea, but after working through the
> > ramifications, I think I prefer "converting" the flag when passing it to
> > kvm_cpu_cap_init().  In-place conversion makes it all but impossible for KVM to
> > check the alias, e.g. via guest_cpu_cap_has(), especially since the AF() macro
> > doesn't set the bits in kvm_known_cpu_caps (if/when a non-hacky validation of
> > usage becomes reality).
> 
> Could you elaborate on this as well?
> 
> My suggestion was that we can just treat aliases as completely independent
> and dummy features, say KVM_X86_FEATURE_FPU_ALIAS, and pass them as is to the
> guest, which means that if an alias is present in host cpuid, it appears in
> kvm caps, and thus qemu can then set it in guest cpuid.
> 
> I don't think that we need any special treatment for them if you look at it
> this way.  If you don't agree, can you give me an example?

KVM doesn't honor the aliases beyond telling userspace they can be set (see below
for all the aliased features that KVM _should_ be checking).  The APM clearly
states that the features are the same as their CPUID.0x1 counterparts, but Intel
CPUs don't support the aliases.  So, as you also note below, I think we could
unequivocally say that enumerating the aliases but not the "real" features is a
bogus CPUID model, but we can't say the opposite, i.e. the real features can
exists without the aliases.

And that means that KVM must never query the aliases, e.g. should never do
guest_cpu_cap_has(KVM_X86_FEATURE_FPU_ALIAS), because the result is essentially
meaningless.  It's a small thing, but if KVM_X86_FEATURE_FPU_ALIAS simply doesn't
exist, i.e. we do in-place conversion, then it's impossible to feed the aliases
into things like guest_cpu_cap_has().

Heh, on a related topic, __cr4_reserved_bits() fails to account for any of the
aliased features.  Unless I'm missing something, VME, DE, TSC, PSE, PAE, PGE and
MCE, all need to be handled in __cr4_reserved_bits().  Amusingly, 
nested_vmx_cr_fixed1_bits_update() handles the aliased legacy features.  I don't
see any reason for nested_vmx_cr_fixed1_bits_update() to manually query guest
CPUID, it should be able to use cr4_guest_rsvd_bits verbatim.

> > Side topic, if it's not already documented somewhere else, kvm/x86/cpuid.rst
> > should call out that KVM only honors the features in CPUID.0x1, i.e. that setting
> > aliased bits in CPUID.0x8000_0001 is supported if and only if the bit(s) is also
> > set in CPUID.0x1.
> 
> To be honest if KVM enforces this, such enforcement can be removed IMHO:

There's no enforcement, and as above I agree that this would be a bogus CPUID
model.  I was thinking that it could be helpful to document that KVM never checks
the aliases, but on second though, it's probably unnecessary because the APM does
say

  Same as CPUID Fn0000_0001_EDX[...]

for all the bits, i.e. setting the aliases without the real bits is an
architectural violation.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 23/49] KVM: x86: Handle kernel- and KVM-defined CPUID words in a single helper
  2024-07-24 17:51       ` Maxim Levitsky
@ 2024-07-25 19:18         ` Sean Christopherson
  2024-08-05 11:07           ` mlevitsk
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-25 19:18 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> On Mon, 2024-07-08 at 14:18 -0700, Sean Christopherson wrote:
> >  And in the unlikely case that we royally screw up and fail
> > to call kvm_cpu_cap_init() on a word, starting with 0xff would result in all
> > features in the uninitialized word being treated as supported.
> Yes, but IMHO the chances of this happening are very low.
> 
> I understand your concerns though, but then IMHO it's better to keep the
> kvm_cpu_cap_init_kvm_defined, because this way at least the function name
> cleanly describes the difference instead of the difference being buried in the function
> itself (the comment helps but still it is less noticeable than a function name). 
> 
> I don't have a very strong opinion on this though, 
> because IMHO the kvm_cpu_cap_init_kvm_defined is also not very user friendly, 
> so if you really think that the new code is more readable, let it be.

Hmm, the main motiviation of this patch was to avoid duplicate code in later
patches, but looking at the end result, I don't think that eliminating the
KVM-defined variants is necessary, e.g. ending up with this should work, too.

#define __kvm_cpu_cap_init(leaf)					\
do {									\
	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
	u32 kvm_cpu_cap_emulated = 0;					\
	u32 kvm_cpu_cap_synthesized = 0;				\
									\
	kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |			\
			       kvm_cpu_cap_synthesized);		\
	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
} while (0)

/* For kernel-defined leafs, mask the boot CPU's pre-populated value. */
#define kvm_cpu_cap_init(leaf, mask)					\
do {
	BUILD_BUG_ON(leaf >= NCAPINTS);					\
	kvm_cpu_caps[leaf] &= (mask);					\
									\
	__kvm_cpu_cap_init(leaf);					\
} while (0)

/* For KVM-defined leafs, explicitly set the leaf, KVM is the sole authority. */
#define kvm_cpu_cap_init_kvm_defined(leaf, mask)			\
do {									\
	BUILD_BUG_ON(leaf < NCAPINTS);					\
	kvm_cpu_caps[leaf] = (mask);					\
									\
	__kvm_cpu_cap_init(leaf);					\
} while (0)

That said, unless someone really likes kvm_cpu_cap_init_kvm_defined(), I am
leaning toward keeping this patch (but rewriting the changelog).  IMO, whether a
leaf is KVM-only or known to the kernel is a plumbing detail that really shouldn't
affect anything in kvm_set_cpu_caps().  Literally the only difference is whether
or not there are kernel capabilities to account for.  The "types" of features isn't
restricted in any way, e.g. CPUID_12_EAX is KVM-only and contains only scattered
features, but CPUID_7_1_EDX is KVM-only and contains only "regular" features.

And if a feature changes from KVM-only to kernel-managed, we'd need to update the
caller.  This is unlikely, but it seems like an unnecessary maintenance burden.

Ooh, and thinking more on that and on the argument against initializing the KVM-
only leafs to all ones, I think we should remove this:

	memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
	       sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)));

and instead explicitly mask the boot_cpu_data.x86_capability[leaf].  It's _way_
more likely that the kernel adds a leaf without updating KVM, in which case
copying the kernel capabilities without masking them against KVM's capabilities
would over-report the set of supported features.  The odds of over-reproring are
still low, as KVM limit the max leaf in __do_cpuid_func(), but unless I'm missing
something, the memcpy() trick adds no value in the current code base.

E.g. 

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index dbc3f6ce9203..593de2c1811b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -730,18 +730,20 @@ do {                                                                      \
 } while (0)
 
 /*
- * For kernel-defined leafs, mask the boot CPU's pre-populated value.  For KVM-
- * defined leafs, explicitly set the leaf, as KVM is the one and only authority.
+ * For leafs that are managed by the kernel, mask the boot CPU's capabilities,
+ * which are populated by the kernel.  For KVM-only leafs, as KVM is the one
+ * and only authority.
  */
 #define kvm_cpu_cap_init(leaf, mask)                                   \
 do {                                                                   \
        const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);    \
        const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;   \
+       const u32 kernel_cpu_caps = boot_cpu_data.x86_capability[leaf]; \
        u32 kvm_cpu_cap_emulated = 0;                                   \
        u32 kvm_cpu_cap_synthesized = 0;                                \
                                                                        \
        if (leaf < NCAPINTS)                                            \
-               kvm_cpu_caps[leaf] &= (mask);                           \
+               kvm_cpu_caps[leaf] = kernel_cpu_caps & (mask);          \
        else                                                            \
                kvm_cpu_caps[leaf] = (mask);                            \
                                                                        \
@@ -763,9 +765,6 @@ void kvm_set_cpu_caps(void)
        BUILD_BUG_ON(sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)) >
                     sizeof(boot_cpu_data.x86_capability));
 
-       memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
-              sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)));
-
        kvm_cpu_cap_init(CPUID_1_ECX,
                /*
                 * NOTE: MONITOR (and MWAIT) are emulated as NOP, but *not*

^ permalink raw reply related	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 24/49] KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions
  2024-07-24 17:54       ` Maxim Levitsky
@ 2024-07-26 23:34         ` Sean Christopherson
  2024-08-05 11:11           ` mlevitsk
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-26 23:34 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> On Mon, 2024-07-08 at 14:29 -0700, Sean Christopherson wrote:
> > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > Maybe we should instead rename the SPEC_CTRL_SSBD to
> > > 'MSR_IA32_SPEC_CTRL_SSBD' and together with it, other fields of this msr.  It
> > > seems that at least some msrs in this file do this.
> > 
> > Yeah, the #undef hack is quite ugly.  But I didn't (and still don't) want to
> > introduce all the renaming churn in the middle of this already too-big series,
> > especially since it would require touching quite a bit of code outside of KVM.
>
> > 
> > I'm also not sure that's the right thing to do; I kinda feel like KVM is the one
> > that's being silly here.
> 
> I don't think that KVM is silly here. I think that hardware definitions like
> MSRs, register names, register bit fields, etc, *must* come with a unique
> prefix, it's not an issue of breaking some deeply nested macro, but rather an
> issue of readability.

For the MSR names themselves, yes, I agree 100%.  But for the bits and mask, I
disagree.  It's simply too verbose, especially given that in the vast majority
of cases simply looking at the surrounding code will provide enough context to
glean an understanding of what's going on.  E.g. even for SPEC_CTRL_SSBD, where
there's an absurd amount of magic and layering, looking at the #define makes
it fairly obvious that it belongs to MSR_IA32_SPEC_CTRL.

And for us x86 folks, who obviously look at this code far more often than non-x86
folks, I find it valuable to know that a bit/mask is exactly that, and _not_ an
MSR index.  E.g. VMX_BASIC_TRUE_CTLS is a good example, where renaming that to
MSR_VMX_BASIC_TRUE_CTLS would make it look too much like MSR_IA32_VMX_TRUE_ENTRY_CTLS
and all the other "true" VMX MSRs.

> SPEC_CTRL_SSBD for example won't mean much to someone who only knows ARM, while
> MSR_SPEC_CTRL_SSBD, or even better IA32_MSR_SPEC_CTRL_SSBD, lets you instantly know
> that this is a MSR, and anyone with even a bit of x86 knowledge should at least have
> heard about what a MSR is.
> 
> In regard to X86_FEATURE_INTEL_SSBD, I don't oppose this idea, because we have
> X86_FEATURE_AMD_SSBD, but in general I do oppose the idea of adding 'INTEL' prefix,

Ya, those are my feelings exactly.  And in this case, since we already have an
AMD variant, I think it's actually a net positive to add an INTEL variant so that
it's clear that Intel and AMD ended up defining separate CPUID to enumerate the
same basic info.

> because it sets a not that good precedent, because most of the features on x86
> are first done by Intel, but then are also implemented by AMD, and thus an intel-only
> feature name can stick after it becomes a general x86 feature.
> 
> IN case of X86_FEATURE_INTEL_SSBD, we already have sadly different CPUID bits for
> each vendor (although I wonder if AMD also sets the X86_FEATURE_INTEL_SSBD).
> 
> I vote to rename 'SPEC_CTRL_SSBD', it can be done as a standalone patch, and can
> be accepted right now, even before this patch series is accepted.

If we go that route, then we also need to rename nearly ever bit/mask definition
in msr-index.h, otherwise SPEC_CTRL_* will be the odd ones out.  And as above, I
don't think this is the right direction.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software
  2024-07-24 17:58       ` Maxim Levitsky
@ 2024-07-27  0:06         ` Sean Christopherson
  2024-08-05 11:16           ` mlevitsk
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-27  0:06 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> On Mon, 2024-07-08 at 15:30 -0700, Sean Christopherson wrote:
> > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > There are several advantages to this:
> > > 
> > > - more readability, plus if needed each statement can be amended with a comment.
> > > - No weird hacks in 'F*' macros, which additionally eventually evaluate into a bit,
> > >   which is confusing.
> > >   In fact no need to even have them at all.
> > > - No need to verify that bitmask belongs to a feature word.
> > 
> > Yes, but the downside is that there is no enforcement of features in a word being
> > bundled together.
> 
> As I explained earlier, this is not an issue in principle, even if the caps are not
> grouped together, the code will still work just fine.

I agree that functionally it'll all be fine, but I also want the code to bunch
things together for readers.  We can force that with functions, though it means
passing in more state to kvm_cpu_cap_init_{begin,end}().

> kvm_cpu_cap_init_begin(CPUID_1_ECX);
>                                 /* TMA is not passed though because: xyz*/
> kvm_cpu_cap_init(TMA,           0);
> kvm_cpu_cap_init(SSSE3,         CAP_PASSTHOUGH);
>                                 /* CNXT_ID is not passed though because: xyz*/
> kvm_cpu_cap_init(CNXT_ID,       0);
> kvm_cpu_cap_init(RESERVED,      0);
> kvm_cpu_cap_init(FMA,           CAP_PASSTHOUGH);
> ...
>                                 /* KVM always emulates TSC_ADJUST */
> kvm_cpu_cap_init(TSC_ADJUST,    CAP_EMULATED | CAP_SCATTERED);
> 
> kvm_cpu_cap_init_end(CPUID_1_ECX);
> 
> ...
> 
> ...
> 
> And kvm_cpu_cap_init_begin, can set some cap_in_progress variable.

Ya, but then compile-time asserts become run-time asserts.

> > > - Merge friendly - each capability has its own line.
> > 
> > That's almost entirely convention though.  Other than inertia, nothing is stopping
> > us from doing:
> > 
> > 	kvm_cpu_cap_init(CPUID_12_EAX,
> > 		SF(SGX1) |
> > 		SF(SGX2) |
> > 		SF(SGX_EDECCSSA)
> 
> That trivial change is already an improvement, although it still leaves the problem
> of thinking that this is one bit 'or', which was reasonable before this patch series,
> because it was indeed one big 'or' but now there is lots of things going on behind
> the scenes and that violates the principle of the least surprise.
> 
> My suggestion fixes this, because when the user sees a series of function calls,
> and nobody will assume anything about these functions calls in contrast with series
> of 'ors'. It's just how I look at it.

If it's the macro styling that's misleading, we could do what we did for the
static_call() wrappers and make them look like functions.  E.g.

	kvm_cpu_cap_init(CPUID_12_EAX,
		scattered_f(SGX1) |
		scattered_f(SGX2) |
		scattered_f(SGX_EDECCSSA)
	);

though that probably doesn't help much and is misleading in its own right.  Does
it help if the names are more verbose? 
 
> > 	);
> > 
> > I don't see a clean way of avoiding the addition of " |" on the last existing
> > line, but in practice I highly doubt that will ever be a source of meaningful pain.
> > 
> > Same goes for the point about adding comments.  We could do that with either
> > approach, we just don't do so today.
> 
> Yes, from the syntax POV there is indeed no problem, and I do agree that putting
> each feature on its own line, together with comments for the features that need it
> is a win-win improvement over what we have after this patch series.
> 
> > 
> > > Disadvantages:
> > > 
> > > - Longer list - IMHO not a problem, since it is very easy to read / search
> > >   and can have as much comments as needed.
> > >   For example this is how the kernel lists the CPUID features and this list IMHO
> > >   is very manageable.
> > 
> > There's one big difference: KVM would need to have a line for every feature that
> > KVM _doesn't_ support.
> 
> Could you elaborate on why?
> If we zero the whole leaf and then set specific bits there, one bit per kvm_cpu_cap_init.

Ah, if we move the the handling of boot_cpu_data[*] into the helpers, then yes,
there's no need to explicitly initialize features that aren't supported by KVM.

That said, I still don't like using functions instead of macros, mainly because
a number of compile-assertions become run-time assertions.  To provide equivalent
functionality, we also would need to pass in extra state to begin/end() (as
mentioned earlier).  Getting compile-time assertions on usage, e.g. via
guest_cpu_cap_has(), would also be trickier, though still doable, I think.
Lastly, it adds an extra step (calling _end()) to each flow, i.e. adds one more
thing for developers to mess up.  But that's a very minor concern and definitely
not a sticking point.

I agree that the macro shenanigans are aggressively clever, but for me, the
benefits of compile-time asserts make it worth dealing with the cleverness.

[*] https://lore.kernel.org/all/ZqKlDC11gItH1uj9@google.com

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 40/49] KVM: x86: Initialize guest cpu_caps based on KVM support
  2024-07-24 18:01       ` Maxim Levitsky
@ 2024-07-29 15:34         ` Sean Christopherson
  2024-08-05 11:16           ` mlevitsk
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-07-29 15:34 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> On Mon, 2024-07-08 at 17:10 -0700, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 0e64a6332052..dbc3f6ce9203 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -448,7 +448,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> >                 if (!entry)
> >                         continue;
> >  
> > -               cpuid_func_emulated(&emulated, cpuid.function);
> > +               cpuid_func_emulated(&emulated, cpuid.function, false);
> >  
> >                 /*
> >                  * A vCPU has a feature if it's supported by KVM and is enabled
> > @@ -1034,7 +1034,8 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
> >         return entry;
> >  }
> >  
> > -static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func)
> > +static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func,
> > +                              bool only_advertised)
> 
> I'll say, lets call this boolean, 'include_partially_emulated', 
> (basically features that kvm emulates but only partially,
> and thus doesn't advertise, aka mwait)
> 
> and then it doesn't look that bad, assuming that comes with a comment.

Works for me.  I was trying to figure out a way to say "emulated_on_ud", but I
can't get the polarity right, at least not without ridiculous verbosity.  E.g.
include_not_emulated_on_ud is awful.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features
  2024-07-25 18:39         ` Sean Christopherson
@ 2024-08-05 11:06           ` mlevitsk
  2024-08-05 22:00             ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: mlevitsk @ 2024-08-05 11:06 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

У чт, 2024-07-25 у 11:39 -0700, Sean Christopherson пише:
> > On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> > > > On Mon, 2024-07-08 at 14:08 -0700, Sean Christopherson wrote:
> > > > > > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > > > > > > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > > > > > > > > > Add a macro to precisely handle CPUID features that AMD duplicated from
> > > > > > > > > > CPUID.0x1.EDX into CPUID.0x8000_0001.EDX.  This will allow adding an
> > > > > > > > > > assert that all features passed to kvm_cpu_cap_init() match the word being
> > > > > > > > > > processed, e.g. to prevent passing a feature from CPUID 0x7 to CPUID 0x1.
> > > > > > > > > > 
> > > > > > > > > > Because the kernel simply reuses the X86_FEATURE_* definitions from
> > > > > > > > > > CPUID.0x1.EDX, KVM's use of the aliased features would result in false
> > > > > > > > > > positives from such an assert.
> > > > > > > > > > 
> > > > > > > > > > No functional change intended.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > > > > > > > > ---
> > > > > > > > > >  arch/x86/kvm/cpuid.c | 24 +++++++++++++++++-------
> > > > > > > > > >  1 file changed, 17 insertions(+), 7 deletions(-)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > > > > > > > > index 5e3b97d06374..f2bd2f5c4ea3 100644
> > > > > > > > > > --- a/arch/x86/kvm/cpuid.c
> > > > > > > > > > +++ b/arch/x86/kvm/cpuid.c
> > > > > > > > > > @@ -88,6 +88,16 @@ u32 xstate_required_size(u64 xstate_bv, bool compacted)
> > > > > > > > > >         F(name);                                                \
> > > > > > > > > >  })
> > > > > > > > > >  
> > > > > > > > > > +/*
> > > > > > > > > > + * Aliased Features - For features in 0x8000_0001.EDX that are duplicates of
> > > > > > > > > > + * identical 0x1.EDX features, and thus are aliased from 0x1 to 0x8000_0001.
> > > > > > > > > > + */
> > > > > > > > > > +#define AF(name)                                                               \
> > > > > > > > > > +({                                                                             \
> > > > > > > > > > +       BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);        \
> > > > > > > > > > +       feature_bit(name);                                                      \
> > > > > > > > > > +})
> > > > > > > > > > +
> > > > > > > > > >  /*
> > > > > > > > > >   * Magic value used by KVM when querying userspace-provided CPUID entries and
> > > > > > > > > >   * doesn't care about the CPIUD index because the index of the function in
> > > > > > > > > > @@ -758,13 +768,13 @@ void kvm_set_cpu_caps(void)
> > > > > > > > > >         );
> > > > > > > > > >  
> > > > > > > > > >         kvm_cpu_cap_init(CPUID_8000_0001_EDX,
> > > > > > > > > > -               F(FPU) | F(VME) | F(DE) | F(PSE) |
> > > > > > > > > > -               F(TSC) | F(MSR) | F(PAE) | F(MCE) |
> > > > > > > > > > -               F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
> > > > > > > > > > -               F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
> > > > > > > > > > -               F(PAT) | F(PSE36) | 0 /* Reserved */ |
> > > > > > > > > > -               F(NX) | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
> > > > > > > > > > -               F(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
> > > > > > > > > > +               AF(FPU) | AF(VME) | AF(DE) | AF(PSE) |
> > > > > > > > > > +               AF(TSC) | AF(MSR) | AF(PAE) | AF(MCE) |
> > > > > > > > > > +               AF(CX8) | AF(APIC) | 0 /* Reserved */ | F(SYSCALL) |
> > > > > > > > > > +               AF(MTRR) | AF(PGE) | AF(MCA) | AF(CMOV) |
> > > > > > > > > > +               AF(PAT) | AF(PSE36) | 0 /* Reserved */ |
> > > > > > > > > > +               F(NX) | 0 /* Reserved */ | F(MMXEXT) | AF(MMX) |
> > > > > > > > > > +               AF(FXSR) | F(FXSR_OPT) | X86_64_F(GBPAGES) | F(RDTSCP) |
> > > > > > > > > >                 0 /* Reserved */ | X86_64_F(LM) | F(3DNOWEXT) | F(3DNOW)
> > > > > > > > > >         );
> > > > > > > > > >  
> > > > > > > > 
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > What if we defined the aliased features instead.
> > > > > > > > Something like this:
> > > > > > > > 
> > > > > > > > #define __X86_FEATURE_8000_0001_ALIAS(feature) \
> > > > > > > >         (feature + (CPUID_8000_0001_EDX - CPUID_1_EDX) * 32)
> > > > > > > > 
> > > > > > > > #define KVM_X86_FEATURE_FPU_ALIAS       __X86_FEATURE_8000_0001_ALIAS(KVM_X86_FEATURE_FPU)
> > > > > > > > #define KVM_X86_FEATURE_VME_ALIAS       __X86_FEATURE_8000_0001_ALIAS(KVM_X86_FEATURE_VME)
> > > > > > > > 
> > > > > > > > And then just use for example the 'F(FPU_ALIAS)' in the CPUID_8000_0001_EDX
> > > > > > 
> > > > > > At first glance, I really liked this idea, but after working through the
> > > > > > ramifications, I think I prefer "converting" the flag when passing it to
> > > > > > kvm_cpu_cap_init().  In-place conversion makes it all but impossible for KVM to
> > > > > > check the alias, e.g. via guest_cpu_cap_has(), especially since the AF() macro
> > > > > > doesn't set the bits in kvm_known_cpu_caps (if/when a non-hacky validation of
> > > > > > usage becomes reality).
> > > > 
> > > > Could you elaborate on this as well?
> > > > 
> > > > My suggestion was that we can just treat aliases as completely independent
> > > > and dummy features, say KVM_X86_FEATURE_FPU_ALIAS, and pass them as is to the
> > > > guest, which means that if an alias is present in host cpuid, it appears in
> > > > kvm caps, and thus qemu can then set it in guest cpuid.
> > > > 
> > > > I don't think that we need any special treatment for them if you look at it
> > > > this way.  If you don't agree, can you give me an example?
> > 
> > KVM doesn't honor the aliases beyond telling userspace they can be set (see below
> > for all the aliased features that KVM _should_ be checking).  The APM clearly
> > states that the features are the same as their CPUID.0x1 counterparts, but Intel
> > CPUs don't support the aliases.  So, as you also note below, I think we could
> > unequivocally say that enumerating the aliases but not the "real" features is a
> > bogus CPUID model, but we can't say the opposite, i.e. the real features can
> > exists without the aliases.
> > 
> > And that means that KVM must never query the aliases, e.g. should never do
> > guest_cpu_cap_has(KVM_X86_FEATURE_FPU_ALIAS), because the result is essentially
> > meaningless.  It's a small thing, but if KVM_X86_FEATURE_FPU_ALIAS simply doesn't
> > exist, i.e. we do in-place conversion, then it's impossible to feed the aliases
> > into things like guest_cpu_cap_has().

This only makes my case stronger - treating the aliases as just features will
allow us to avoid adding more logic to code which is already too complex IMHO.

If your concern is that features could be queried by guest_cpu_cap_has()
that is easy to fix, we can (and should) put them into a separate file and
#include them only in cpuid.c.

We can even #undef the __X86_FEATURE_8000_0001_ALIAS macro after the kvm_set_cpu_caps,
then if I understand the macro pre-processor correctly, any use of feature alias
macros will not fully evaluate and cause a compile error.



> > 
> > Heh, on a related topic, __cr4_reserved_bits() fails to account for any of the
> > aliased features.  Unless I'm missing something, VME, DE, TSC, PSE, PAE, PGE and
> > MCE, all need to be handled in __cr4_reserved_bits(). 
> >  Amusingly, 
> > nested_vmx_cr_fixed1_bits_update() handles the aliased legacy features.  I don't
> > see any reason for nested_vmx_cr_fixed1_bits_update() to manually query guest
> > CPUID, it should be able to use cr4_guest_rsvd_bits verbatim.

Yep, this should be fixed - this patch series is about to grow even more I guess,
or rather let me suggest that you split it into several patch series, which
can be merged and discussed separately.


> > 
> > > > > > Side topic, if it's not already documented somewhere else, kvm/x86/cpuid.rst
> > > > > > should call out that KVM only honors the features in CPUID.0x1, i.e. that setting
> > > > > > aliased bits in CPUID.0x8000_0001 is supported if and only if the bit(s) is also
> > > > > > set in CPUID.0x1.
> > > > 
> > > > To be honest if KVM enforces this, such enforcement can be removed IMHO:
> > 
> > There's no enforcement, and as above I agree that this would be a bogus CPUID
> > model.  I was thinking that it could be helpful to document that KVM never checks
> > the aliases, but on second though, it's probably unnecessary because the APM does
> > say
> > 
> >   Same as CPUID Fn0000_0001_EDX[...]
> > 
> > for all the bits, i.e. setting the aliases without the real bits is an
> > architectural violation.

Regardless if this is an architectural violation or not, KVM should allow this
because it allows many architectural violations, like AVX3 with no XSAVE, and such.

IMHO being consistent is more important than being right in only some cases,
and I don't think we want to start enforcing all the CPUID dependencies
(I actually won't object to this).

Best regards,
	Maxim Levitsky


> > 


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 23/49] KVM: x86: Handle kernel- and KVM-defined CPUID words in a single helper
  2024-07-25 19:18         ` Sean Christopherson
@ 2024-08-05 11:07           ` mlevitsk
  0 siblings, 0 replies; 185+ messages in thread
From: mlevitsk @ 2024-08-05 11:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

У чт, 2024-07-25 у 12:18 -0700, Sean Christopherson пише:
> > On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> > > > On Mon, 2024-07-08 at 14:18 -0700, Sean Christopherson wrote:
> > > > > >  And in the unlikely case that we royally screw up and fail
> > > > > > to call kvm_cpu_cap_init() on a word, starting with 0xff would result in all
> > > > > > features in the uninitialized word being treated as supported.
> > > > Yes, but IMHO the chances of this happening are very low.
> > > > 
> > > > I understand your concerns though, but then IMHO it's better to keep the
> > > > kvm_cpu_cap_init_kvm_defined, because this way at least the function name
> > > > cleanly describes the difference instead of the difference being buried in the function
> > > > itself (the comment helps but still it is less noticeable than a function name). 
> > > > 
> > > > I don't have a very strong opinion on this though, 
> > > > because IMHO the kvm_cpu_cap_init_kvm_defined is also not very user friendly, 
> > > > so if you really think that the new code is more readable, let it be.
> > 
> > Hmm, the main motiviation of this patch was to avoid duplicate code in later
> > patches, but looking at the end result, I don't think that eliminating the
> > KVM-defined variants is necessary, e.g. ending up with this should work, too.
> > 
> > #define __kvm_cpu_cap_init(leaf)                                        \
> > do {                                                                    \
> >         const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);    \
> >         const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;   \
> >         u32 kvm_cpu_cap_emulated = 0;                                   \
> >         u32 kvm_cpu_cap_synthesized = 0;                                \
> >                                                                         \
> >         kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |                   \
> >                                kvm_cpu_cap_synthesized);                \
> >         kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;                     \
> > } while (0)
> > 
> > /* For kernel-defined leafs, mask the boot CPU's pre-populated value. */
> > #define kvm_cpu_cap_init(leaf, mask)                                    \
> > do {
> >         BUILD_BUG_ON(leaf >= NCAPINTS);                                 \
> >         kvm_cpu_caps[leaf] &= (mask);                                   \
> >                                                                         \
> >         __kvm_cpu_cap_init(leaf);                                       \
> > } while (0)
> > 
> > /* For KVM-defined leafs, explicitly set the leaf, KVM is the sole authority. */
> > #define kvm_cpu_cap_init_kvm_defined(leaf, mask)                        \
> > do {                                                                    \
> >         BUILD_BUG_ON(leaf < NCAPINTS);                                  \
> >         kvm_cpu_caps[leaf] = (mask);                                    \
> >                                                                         \
> >         __kvm_cpu_cap_init(leaf);                                       \
> > } while (0)
> > 
> > That said, unless someone really likes kvm_cpu_cap_init_kvm_defined(), I am
> > leaning toward keeping this patch (but rewriting the changelog).  IMO, whether a
> > leaf is KVM-only or known to the kernel is a plumbing detail that really shouldn't
> > affect anything in kvm_set_cpu_caps().  Literally the only difference is whether
> > or not there are kernel capabilities to account for.  The "types" of features isn't
> > restricted in any way, e.g. CPUID_12_EAX is KVM-only and contains only scattered
> > features, but CPUID_7_1_EDX is KVM-only and contains only "regular" features.
> > 
> > And if a feature changes from KVM-only to kernel-managed, we'd need to update the
> > caller.  This is unlikely, but it seems like an unnecessary maintenance burden.
> > 
> > Ooh, and thinking more on that and on the argument against initializing the KVM-
> > only leafs to all ones, I think we should remove this:
> > 
> >         memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
> >                sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)));
> > 
> > and instead explicitly mask the boot_cpu_data.x86_capability[leaf].  It's _way_
> > more likely that the kernel adds a leaf without updating KVM, in which case
> > copying the kernel capabilities without masking them against KVM's capabilities
> > would over-report the set of supported features.  The odds of over-reproring are
> > still low, as KVM limit the max leaf in __do_cpuid_func(), but unless I'm missing
> > something, the memcpy() trick adds no value in the current code base.

Nothing against this.


> > 
> > E.g. 
> > 
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index dbc3f6ce9203..593de2c1811b 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -730,18 +730,20 @@ do {                                                                      \
> >  } while (0)
> >  
> >  /*
> > - * For kernel-defined leafs, mask the boot CPU's pre-populated value.  For KVM-
> > - * defined leafs, explicitly set the leaf, as KVM is the one and only authority.
> > + * For leafs that are managed by the kernel, mask the boot CPU's capabilities,
> > + * which are populated by the kernel.  For KVM-only leafs, as KVM is the one
> > + * and only authority.
> >   */
> >  #define kvm_cpu_cap_init(leaf, mask)                                   \
> >  do {                                                                   \
> >         const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);    \
> >         const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;   \
> > +       const u32 kernel_cpu_caps = boot_cpu_data.x86_capability[leaf]; \
> >         u32 kvm_cpu_cap_emulated = 0;                                   \
> >         u32 kvm_cpu_cap_synthesized = 0;                                \
> >                                                                         \
> >         if (leaf < NCAPINTS)                                            \
> > -               kvm_cpu_caps[leaf] &= (mask);                           \
> > +               kvm_cpu_caps[leaf] = kernel_cpu_caps & (mask);          \
> >         else                                                            \
> >                 kvm_cpu_caps[leaf] = (mask);                            \

I am not going to argue much about this, I still think that assigning directly
using a mask is confusing.

Using kernel_cpu_caps is indeed better regardless of other issues.

Best regards,
	Maxim Levitsky


> >                                                                         \
> > @@ -763,9 +765,6 @@ void kvm_set_cpu_caps(void)
> >         BUILD_BUG_ON(sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)) >
> >                      sizeof(boot_cpu_data.x86_capability));
> >  
> > -       memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
> > -              sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)));
> > -
> >         kvm_cpu_cap_init(CPUID_1_ECX,
> >                 /*
> >                  * NOTE: MONITOR (and MWAIT) are emulated as NOP, but *not*
> > 


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 24/49] KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions
  2024-07-26 23:34         ` Sean Christopherson
@ 2024-08-05 11:11           ` mlevitsk
  2024-08-05 21:35             ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: mlevitsk @ 2024-08-05 11:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

У пт, 2024-07-26 у 16:34 -0700, Sean Christopherson пише:
> > On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> > > > On Mon, 2024-07-08 at 14:29 -0700, Sean Christopherson wrote:
> > > > > > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > > > > > > Maybe we should instead rename the SPEC_CTRL_SSBD to
> > > > > > > > 'MSR_IA32_SPEC_CTRL_SSBD' and together with it, other fields of this msr.  It
> > > > > > > > seems that at least some msrs in this file do this.
> > > > > > 
> > > > > > Yeah, the #undef hack is quite ugly.  But I didn't (and still don't) want to
> > > > > > introduce all the renaming churn in the middle of this already too-big series,
> > > > > > especially since it would require touching quite a bit of code outside of KVM.
> > > > 
> > > > > > 
> > > > > > I'm also not sure that's the right thing to do; I kinda feel like KVM is the one
> > > > > > that's being silly here.
> > > > 
> > > > I don't think that KVM is silly here. I think that hardware definitions like
> > > > MSRs, register names, register bit fields, etc, *must* come with a unique
> > > > prefix, it's not an issue of breaking some deeply nested macro, but rather an
> > > > issue of readability.
> > 
> > For the MSR names themselves, yes, I agree 100%.  But for the bits and mask, I
> > disagree.  It's simply too verbose, especially given that in the vast majority
> > of cases simply looking at the surrounding code will provide enough context to
> > glean an understanding of what's going on.

I am not that sure about this, especially if someone by mistake uses a flag
that belong to one MSR, in some unrelated place. Verbose code is rarely a bad thing.


> >   E.g. even for SPEC_CTRL_SSBD, where
> > there's an absurd amount of magic and layering, looking at the #define makes
> > it fairly obvious that it belongs to MSR_IA32_SPEC_CTRL.
> > 
> > And for us x86 folks, who obviously look at this code far more often than non-x86
> > folks, I find it valuable to know that a bit/mask is exactly that, and _not_ an
> > MSR index.  E.g. VMX_BASIC_TRUE_CTLS is a good example, where renaming that to
> > MSR_VMX_BASIC_TRUE_CTLS would make it look too much like MSR_IA32_VMX_TRUE_ENTRY_CTLS
> > and all the other "true" VMX MSRs.
> > 
> > > > SPEC_CTRL_SSBD for example won't mean much to someone who only knows ARM, while
> > > > MSR_SPEC_CTRL_SSBD, or even better IA32_MSR_SPEC_CTRL_SSBD, lets you instantly know
> > > > that this is a MSR, and anyone with even a bit of x86 knowledge should at least have
> > > > heard about what a MSR is.
> > > > 
> > > > In regard to X86_FEATURE_INTEL_SSBD, I don't oppose this idea, because we have
> > > > X86_FEATURE_AMD_SSBD, but in general I do oppose the idea of adding 'INTEL' prefix,
> > 
> > Ya, those are my feelings exactly.  And in this case, since we already have an
> > AMD variant, I think it's actually a net positive to add an INTEL variant so that
> > it's clear that Intel and AMD ended up defining separate CPUID to enumerate the
> > same basic info.
> > 
> > > > because it sets a not that good precedent, because most of the features on x86
> > > > are first done by Intel, but then are also implemented by AMD, and thus an intel-only
> > > > feature name can stick after it becomes a general x86 feature.
> > > > 
> > > > IN case of X86_FEATURE_INTEL_SSBD, we already have sadly different CPUID bits for
> > > > each vendor (although I wonder if AMD also sets the X86_FEATURE_INTEL_SSBD).
> > > > 
> > > > I vote to rename 'SPEC_CTRL_SSBD', it can be done as a standalone patch, and can
> > > > be accepted right now, even before this patch series is accepted.
> > 
> > If we go that route, then we also need to rename nearly ever bit/mask definition
> > in msr-index.h, otherwise SPEC_CTRL_* will be the odd ones out.  And as above, I
> > don't think this is the right direction.

Honestly not really. If you look carefully at the file, many bits are already defined
in the way I suggest, for example:

MSR_PLATFORM_INFO_CPUID_FAULT_BIT
MSR_IA32_POWER_CTL_BIT_EE
MSR_INTEGRITY_CAPS_ARRAY_BIST_BIT
MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT


This file has all kind of names for both msrs and flags. There is not much order,
so renaming the bit definitions of IA32_SPEC_CTRL won't increase the level of disorder
in this file IMHO.


Best regards,
	Maxim Levitsky



> > 


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software
  2024-07-27  0:06         ` Sean Christopherson
@ 2024-08-05 11:16           ` mlevitsk
  2024-08-05 19:59             ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: mlevitsk @ 2024-08-05 11:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

У пт, 2024-07-26 у 17:06 -0700, Sean Christopherson пише:
> > On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> > > > On Mon, 2024-07-08 at 15:30 -0700, Sean Christopherson wrote:
> > > > > > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > > > > > > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > > > > > > There are several advantages to this:
> > > > > > > > 
> > > > > > > > - more readability, plus if needed each statement can be amended with a comment.
> > > > > > > > - No weird hacks in 'F*' macros, which additionally eventually evaluate into a bit,
> > > > > > > >   which is confusing.
> > > > > > > >   In fact no need to even have them at all.
> > > > > > > > - No need to verify that bitmask belongs to a feature word.
> > > > > > 
> > > > > > Yes, but the downside is that there is no enforcement of features in a word being
> > > > > > bundled together.
> > > > 
> > > > As I explained earlier, this is not an issue in principle, even if the caps are not
> > > > grouped together, the code will still work just fine.
> > 
> > I agree that functionally it'll all be fine, but I also want the code to bunch
> > things together for readers.  We can force that with functions, though it means
> > passing in more state to kvm_cpu_cap_init_{begin,end}().
> > 
> > > > kvm_cpu_cap_init_begin(CPUID_1_ECX);
> > > >                                 /* TMA is not passed though because: xyz*/
> > > > kvm_cpu_cap_init(TMA,           0);
> > > > kvm_cpu_cap_init(SSSE3,         CAP_PASSTHOUGH);
> > > >                                 /* CNXT_ID is not passed though because: xyz*/
> > > > kvm_cpu_cap_init(CNXT_ID,       0);
> > > > kvm_cpu_cap_init(RESERVED,      0);
> > > > kvm_cpu_cap_init(FMA,           CAP_PASSTHOUGH);
> > > > ...
> > > >                                 /* KVM always emulates TSC_ADJUST */
> > > > kvm_cpu_cap_init(TSC_ADJUST,    CAP_EMULATED | CAP_SCATTERED);
> > > > 
> > > > kvm_cpu_cap_init_end(CPUID_1_ECX);
> > > > 
> > > > ...
> > > > 
> > > > ...
> > > > 
> > > > And kvm_cpu_cap_init_begin, can set some cap_in_progress variable.
> > 
> > Ya, but then compile-time asserts become run-time asserts.

Not really, it all can be done with macros, in exactly the same way IMHO,
we do have BUILD_BUG_ON after all.

I am not against using macros, I am only against collecting a bitmask
while applying various side effects, and then passing the bitmask to
the kvm_cpu_cap_init.

> > 
> > > > > > > > - Merge friendly - each capability has its own line.
> > > > > > 
> > > > > > That's almost entirely convention though.  Other than inertia, nothing is stopping
> > > > > > us from doing:
> > > > > > 
> > > > > >         kvm_cpu_cap_init(CPUID_12_EAX,
> > > > > >                 SF(SGX1) |
> > > > > >                 SF(SGX2) |
> > > > > >                 SF(SGX_EDECCSSA)
> > > > 
> > > > That trivial change is already an improvement, although it still leaves the problem
> > > > of thinking that this is one bit 'or', which was reasonable before this patch series,
> > > > because it was indeed one big 'or' but now there is lots of things going on behind
> > > > the scenes and that violates the principle of the least surprise.
> > > > 
> > > > My suggestion fixes this, because when the user sees a series of function calls,
> > > > and nobody will assume anything about these functions calls in contrast with series
> > > > of 'ors'. It's just how I look at it.
> > 
> > If it's the macro styling that's misleading, we could do what we did for the
> > static_call() wrappers and make them look like functions.  E.g.
> > 
> >         kvm_cpu_cap_init(CPUID_12_EAX,
> >                 scattered_f(SGX1) |
> >                 scattered_f(SGX2) |
> >                 scattered_f(SGX_EDECCSSA)
> >         );



> > 
> > though that probably doesn't help much and is misleading in its own right.  Does
> > it help if the names are more verbose? 

Verbose names are a good thing, I already mentioned this.

> >  
> > > > > >         );
> > > > > > 
> > > > > > I don't see a clean way of avoiding the addition of " |" on the last existing
> > > > > > line, but in practice I highly doubt that will ever be a source of meaningful pain.
> > > > > > 
> > > > > > Same goes for the point about adding comments.  We could do that with either
> > > > > > approach, we just don't do so today.
> > > > 
> > > > Yes, from the syntax POV there is indeed no problem, and I do agree that putting
> > > > each feature on its own line, together with comments for the features that need it
> > > > is a win-win improvement over what we have after this patch series.
> > > > 
> > > > > > 
> > > > > > > > Disadvantages:
> > > > > > > > 
> > > > > > > > - Longer list - IMHO not a problem, since it is very easy to read / search
> > > > > > > >   and can have as much comments as needed.
> > > > > > > >   For example this is how the kernel lists the CPUID features and this list IMHO
> > > > > > > >   is very manageable.
> > > > > > 
> > > > > > There's one big difference: KVM would need to have a line for every feature that
> > > > > > KVM _doesn't_ support.
> > > > 
> > > > Could you elaborate on why?
> > > > If we zero the whole leaf and then set specific bits there, one bit per kvm_cpu_cap_init.
> > 
> > Ah, if we move the the handling of boot_cpu_data[*] into the helpers, then yes,
> > there's no need to explicitly initialize features that aren't supported by KVM.
> > 
> > That said, I still don't like using functions instead of macros, mainly because
> > a number of compile-assertions become run-time assertions.

I'm almost sure that we can do everything with compile time asserts with series of functions.




> >   To provide equivalent
> > functionality, we also would need to pass in extra state to begin/end() (as
> > mentioned earlier).

Besides the number of leaf currently initialized, I don't see which other extra state we need.

In fact I can prove that this is possible:

Roughly like this:

#define kvm_cpu_cap_init_begin(leaf)							\
do {											\
 const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf; 				\
 u32 kvm_cpu_cap_emulated = 0; 								\
 u32 kvm_cpu_cap_synthesized = 0; 							\
	u32 kvm_cpu_cap_regular = 0;


#define feature_scattered(name) 							\
 BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES); 					\
 KVM_VALIDATE_CPU_CAP_USAGE(name); 							\
											\
	if (boot_cpu_has(X86_FEATURE_##name) 						\
		kvm_cpu_cap_regular |= feature_bit(name);


#define kvm_cpu_cap_init_end() 								\
	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);			\
											\
	if (kvm_cpu_cap_init_in_progress < NCAPINTS) 					\
 		kvm_cpu_caps[kvm_cpu_cap_init_in_progress] &= kvm_cpu_cap_regular; 	\
 	else 										\
 		kvm_cpu_caps[kvm_cpu_cap_init_in_progress] = kvm_cpu_cap_regular; 	\
 											\
 	kvm_cpu_caps[kvm_cpu_cap_init_in_progress] &= (raw_cpuid_get(cpuid) | 		\
 	kvm_cpu_cap_synthesized); 							\
 	kvm_cpu_caps[kvm_cpu_cap_init_in_progress] |= kvm_cpu_cap_emulated; 		\
} while(0);


And now we have:

kvm_cpu_cap_init_begin(CPUID_12_EAX);
 feature_scattered(SGX1);
 feature_scattered(SGX2);
 feature_scattered(SGX_EDECCSSA);
kvm_cpu_cap_init_end();

In my book this looks much less misleading than the current version - I didn't put
much effort into naming variables though, the kvm_cpu_cap_regular name can be better IMHO.


Best regards,
	Maxim Levitsky


> >   Getting compile-time assertions on usage, e.g. via
> > guest_cpu_cap_has(), would also be trickier, though still doable, I think.
> > Lastly, it adds an extra step (calling _end()) to each flow, i.e. adds one more
> > thing for developers to mess up.  But that's a very minor concern and definitely
> > not a sticking point.
> > 
> > I agree that the macro shenanigans are aggressively clever, but for me, the
> > benefits of compile-time asserts make it worth dealing with the cleverness.
> > 
> > [*] https://lore.kernel.org/all/ZqKlDC11gItH1uj9@google.com
> > 


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 40/49] KVM: x86: Initialize guest cpu_caps based on KVM support
  2024-07-29 15:34         ` Sean Christopherson
@ 2024-08-05 11:16           ` mlevitsk
  0 siblings, 0 replies; 185+ messages in thread
From: mlevitsk @ 2024-08-05 11:16 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

У пн, 2024-07-29 у 08:34 -0700, Sean Christopherson пише:
> > On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> > > > On Mon, 2024-07-08 at 17:10 -0700, Sean Christopherson wrote:
> > > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > > > > > index 0e64a6332052..dbc3f6ce9203 100644
> > > > > > --- a/arch/x86/kvm/cpuid.c
> > > > > > +++ b/arch/x86/kvm/cpuid.c
> > > > > > @@ -448,7 +448,7 @@ void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > > > > >                 if (!entry)
> > > > > >                         continue;
> > > > > >  
> > > > > > -               cpuid_func_emulated(&emulated, cpuid.function);
> > > > > > +               cpuid_func_emulated(&emulated, cpuid.function, false);
> > > > > >  
> > > > > >                 /*
> > > > > >                  * A vCPU has a feature if it's supported by KVM and is enabled
> > > > > > @@ -1034,7 +1034,8 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
> > > > > >         return entry;
> > > > > >  }
> > > > > >  
> > > > > > -static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func)
> > > > > > +static int cpuid_func_emulated(struct kvm_cpuid_entry2 *entry, u32 func,
> > > > > > +                              bool only_advertised)
> > > > 
> > > > I'll say, lets call this boolean, 'include_partially_emulated', 
> > > > (basically features that kvm emulates but only partially,
> > > > and thus doesn't advertise, aka mwait)
> > > > 
> > > > and then it doesn't look that bad, assuming that comes with a comment.
> > 
> > Works for me.  I was trying to figure out a way to say "emulated_on_ud", but I
> > can't get the polarity right, at least not without ridiculous verbosity.  E.g.
> > include_not_emulated_on_ud is awful.
> > 

Thanks,
Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software
  2024-08-05 11:16           ` mlevitsk
@ 2024-08-05 19:59             ` Sean Christopherson
  2024-09-10 20:41               ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-08-05 19:59 UTC (permalink / raw)
  To: mlevitsk
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, Aug 05, 2024, mlevitsk@redhat.com wrote:
> У пт, 2024-07-26 у 17:06 -0700, Sean Christopherson пише:
> > > > > And kvm_cpu_cap_init_begin, can set some cap_in_progress variable.
> > > 
> > > Ya, but then compile-time asserts become run-time asserts.
> 
> Not really, it all can be done with macros, in exactly the same way IMHO,
> we do have BUILD_BUG_ON after all.
> 
> I am not against using macros, I am only against collecting a bitmask
> while applying various side effects, and then passing the bitmask to
> the kvm_cpu_cap_init.

Gah, I wasn't grokking that, obviously.  Sorry for not catching on earlier.

> > > To provide equivalent functionality, we also would need to pass in extra
> > > state to begin/end() (as mentioned earlier).
> 
> Besides the number of leaf currently initialized, I don't see which other
> extra state we need.
> 
> In fact I can prove that this is possible:
> 
> Roughly like this:
> 
> #define kvm_cpu_cap_init_begin(leaf)							\
> do {											\
>  const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf; 				\
>  u32 kvm_cpu_cap_emulated = 0; 								\
>  u32 kvm_cpu_cap_synthesized = 0; 							\
> 	u32 kvm_cpu_cap_regular = 0;

Maybe "virtualized" instead of "regular"?

> #define feature_scattered(name) 							\
>  BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES); 					\
>  KVM_VALIDATE_CPU_CAP_USAGE(name); 							\
> 											\
> 	if (boot_cpu_has(X86_FEATURE_##name) 						\
> 		kvm_cpu_cap_regular |= feature_bit(name);
> 
> 
> #define kvm_cpu_cap_init_end() 								\
> 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);			\
> 											\
> 	if (kvm_cpu_cap_init_in_progress < NCAPINTS) 					\
>  		kvm_cpu_caps[kvm_cpu_cap_init_in_progress] &= kvm_cpu_cap_regular; 	\
>  	else 										\
>  		kvm_cpu_caps[kvm_cpu_cap_init_in_progress] = kvm_cpu_cap_regular; 	\
>  											\
>  	kvm_cpu_caps[kvm_cpu_cap_init_in_progress] &= (raw_cpuid_get(cpuid) | 		\
>  	kvm_cpu_cap_synthesized); 							\
>  	kvm_cpu_caps[kvm_cpu_cap_init_in_progress] |= kvm_cpu_cap_emulated; 		\
> } while(0);
> 
> 
> And now we have:
> 
> kvm_cpu_cap_init_begin(CPUID_12_EAX);
>  feature_scattered(SGX1);
>  feature_scattered(SGX2);
>  feature_scattered(SGX_EDECCSSA);
> kvm_cpu_cap_init_end();

I don't love the syntax (mainly the need for a begin()+end()), but I'm a-ok
getting rid of the @mask param/input.

What about making kvm_cpu_cap_init() a variadic macro, with the relevant features
"unpacked" in the context of the macro?  That would avoid the need for a trailing
macro, and would provide a clear indication of when/where the set of features is
"initialized".

The biggest downside I see is that the last entry can't have a trailing comma,
i.e. adding a new feature would require updating the previous feature too.

#define kvm_cpu_cap_init(leaf, init_features...)			\
do {									\
	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
	u32 kvm_cpu_cap_virtualized= 0;					\
	u32 kvm_cpu_cap_emulated = 0;					\
	u32 kvm_cpu_cap_synthesized = 0;				\
									\
	init_features;							\
									\
	kvm_cpu_caps[leaf] = kvm_cpu_cap_virtualized;			\
	kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |			\
			       kvm_cpu_cap_synthesized);		\
	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
} while (0)

	kvm_cpu_cap_init(CPUID_1_ECX,
		VIRTUALIZED_F(XMM3),
		VIRTUALIZED_F(PCLMULQDQ),
		VIRTUALIZED_F(SSSE3),
		VIRTUALIZED_F(FMA),
		VIRTUALIZED_F(CX16),
		VIRTUALIZED_F(PDCM),
		VIRTUALIZED_F(PCID),
		VIRTUALIZED_F(XMM4_1),
		VIRTUALIZED_F(XMM4_2),
		EMULATED_F(X2APIC),
		VIRTUALIZED_F(MOVBE),
		VIRTUALIZED_F(POPCNT),
		EMULATED_F(TSC_DEADLINE_TIMER),
		VIRTUALIZED_F(AES),
		VIRTUALIZED_F(XSAVE),
		// DYNAMIC_F(OSXSAVE),
		VIRTUALIZED_F(AVX),
		VIRTUALIZED_F(F16C),
		VIRTUALIZED_F(RDRAND),
		EMULATED_F(HYPERVISOR)
	);


Alternatively, we could force a trailing comma by omitting the semicolon after
init_features, but that looks weird for the the macro itself, and arguably a bit
weird for the users too.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 24/49] KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions
  2024-08-05 11:11           ` mlevitsk
@ 2024-08-05 21:35             ` Sean Christopherson
  2024-09-10 20:37               ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-08-05 21:35 UTC (permalink / raw)
  To: mlevitsk
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo,
	Borislav Petkov

+Boris

On Mon, Aug 05, 2024, mlevitsk@redhat.com wrote:
> У пт, 2024-07-26 у 16:34 -0700, Sean Christopherson пише:
> > > On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> > > > > On Mon, 2024-07-08 at 14:29 -0700, Sean Christopherson wrote:
> > > > > > > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > > > > > > > Maybe we should instead rename the SPEC_CTRL_SSBD to
> > > > > > > > > 'MSR_IA32_SPEC_CTRL_SSBD' and together with it, other fields of this msr.  It
> > > > > > > > > seems that at least some msrs in this file do this.
> > > > > > > 
> > > > > > > Yeah, the #undef hack is quite ugly.  But I didn't (and still don't) want to
> > > > > > > introduce all the renaming churn in the middle of this already too-big series,
> > > > > > > especially since it would require touching quite a bit of code outside of KVM.
> > > > > 
> > > > > > > 
> > > > > > > I'm also not sure that's the right thing to do; I kinda feel like KVM is the one
> > > > > > > that's being silly here.
> > > > > 
> > > > > I don't think that KVM is silly here. I think that hardware definitions like
> > > > > MSRs, register names, register bit fields, etc, *must* come with a unique
> > > > > prefix, it's not an issue of breaking some deeply nested macro, but rather an
> > > > > issue of readability.
> > > 
> > > For the MSR names themselves, yes, I agree 100%.  But for the bits and mask, I
> > > disagree.  It's simply too verbose, especially given that in the vast majority
> > > of cases simply looking at the surrounding code will provide enough context to
> > > glean an understanding of what's going on.
> 
> I am not that sure about this, especially if someone by mistake uses a flag
> that belong to one MSR, in some unrelated place. Verbose code is rarely a bad thing.
> 
> 
> > >   E.g. even for SPEC_CTRL_SSBD, where
> > > there's an absurd amount of magic and layering, looking at the #define makes
> > > it fairly obvious that it belongs to MSR_IA32_SPEC_CTRL.
> > > 
> > > And for us x86 folks, who obviously look at this code far more often than non-x86
> > > folks, I find it valuable to know that a bit/mask is exactly that, and _not_ an
> > > MSR index.  E.g. VMX_BASIC_TRUE_CTLS is a good example, where renaming that to
> > > MSR_VMX_BASIC_TRUE_CTLS would make it look too much like MSR_IA32_VMX_TRUE_ENTRY_CTLS
> > > and all the other "true" VMX MSRs.
> > > 
> > > > > SPEC_CTRL_SSBD for example won't mean much to someone who only knows ARM, while
> > > > > MSR_SPEC_CTRL_SSBD, or even better IA32_MSR_SPEC_CTRL_SSBD, lets you instantly know
> > > > > that this is a MSR, and anyone with even a bit of x86 knowledge should at least have
> > > > > heard about what a MSR is.
> > > > > 
> > > > > In regard to X86_FEATURE_INTEL_SSBD, I don't oppose this idea, because we have
> > > > > X86_FEATURE_AMD_SSBD, but in general I do oppose the idea of adding 'INTEL' prefix,
> > > 
> > > Ya, those are my feelings exactly.  And in this case, since we already have an
> > > AMD variant, I think it's actually a net positive to add an INTEL variant so that
> > > it's clear that Intel and AMD ended up defining separate CPUID to enumerate the
> > > same basic info.
> > > 
> > > > > because it sets a not that good precedent, because most of the features on x86
> > > > > are first done by Intel, but then are also implemented by AMD, and thus an intel-only
> > > > > feature name can stick after it becomes a general x86 feature.
> > > > > 
> > > > > IN case of X86_FEATURE_INTEL_SSBD, we already have sadly different CPUID bits for
> > > > > each vendor (although I wonder if AMD also sets the X86_FEATURE_INTEL_SSBD).
> > > > > 
> > > > > I vote to rename 'SPEC_CTRL_SSBD', it can be done as a standalone patch, and can
> > > > > be accepted right now, even before this patch series is accepted.
> > > 
> > > If we go that route, then we also need to rename nearly ever bit/mask definition
> > > in msr-index.h, otherwise SPEC_CTRL_* will be the odd ones out.  And as above, I
> > > don't think this is the right direction.
> 
> Honestly not really. If you look carefully at the file, many bits are already defined
> in the way I suggest, for example:
> 
> MSR_PLATFORM_INFO_CPUID_FAULT_BIT
> MSR_IA32_POWER_CTL_BIT_EE
> MSR_INTEGRITY_CAPS_ARRAY_BIST_BIT
> MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT

Heh, I know there are some bits that have an "MSR" prefix, hence "nearly every".

> This file has all kind of names for both msrs and flags. There is not much
> order, so renaming the bit definitions of IA32_SPEC_CTRL won't increase the
> level of disorder in this file IMHO.

It depends on what direction msr-index.h is headed.  If the long-term preference
is to have bits/masks namespaced with only their associated MSR name, i.e. no
explicit MSR_, then renaming the bits is counter-productive.

I added Boris, who I believe was the most opinionated about the MSR bit names,
i.e. who can most likely give us the closest thing to an authoritative answer as
to the preferred style.

Boris, we're debating about the best way to solve a weird collision between:

  #define SPEC_CTRL_SSBD

and

  #define X86_FEATURE_SPEC_CTRL_SSBD

KVM wants to use its CPUID macros to essentially do:

  #define F(name) (X86_FEATURE_##name)

as a shorthand for X86_FEATURE_SPEC_CTRL_SSBD, but that can cause build failures
depending on how KVM's macros are layered.  E.g. SPEC_CTRL_SSBD can get resolved
to its value prior to token concatentation and result in KVM effectively generating
X86_FEATURE_BIT(SPEC_CTRL_SSBD_SHIFT).

One of the proposed solutions is to rename all of the SPEC_CTRL_* bit definitions
to add a MSR_ prefix, e.g. to generate MSR_SPEC_CTRL_SSBD and avoid the conflict.
My recollection from the IA32_FEATURE_CONTROL rework a few years back is that you
wanted to prioritize shorter names over having everything namespaced with MSR_,
i.e. that this approach is a non-starter.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features
  2024-08-05 11:06           ` mlevitsk
@ 2024-08-05 22:00             ` Sean Christopherson
  2024-09-10 20:37               ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-08-05 22:00 UTC (permalink / raw)
  To: mlevitsk
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, Aug 05, 2024, mlevitsk@redhat.com wrote:
> У чт, 2024-07-25 у 11:39 -0700, Sean Christopherson пише:
> > > On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> > > > > On Mon, 2024-07-08 at 14:08 -0700, Sean Christopherson wrote:
> > > > > > > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > > > > > > > What if we defined the aliased features instead.
> > > > > > > > > Something like this:
> > > > > > > > > 
> > > > > > > > > #define __X86_FEATURE_8000_0001_ALIAS(feature) \
> > > > > > > > >         (feature + (CPUID_8000_0001_EDX - CPUID_1_EDX) * 32)
> > > > > > > > > 
> > > > > > > > > #define KVM_X86_FEATURE_FPU_ALIAS       __X86_FEATURE_8000_0001_ALIAS(KVM_X86_FEATURE_FPU)
> > > > > > > > > #define KVM_X86_FEATURE_VME_ALIAS       __X86_FEATURE_8000_0001_ALIAS(KVM_X86_FEATURE_VME)
> > > > > > > > > 
> > > > > > > > > And then just use for example the 'F(FPU_ALIAS)' in the CPUID_8000_0001_EDX
> > > > > > > 
> > > > > > > At first glance, I really liked this idea, but after working through the
> > > > > > > ramifications, I think I prefer "converting" the flag when passing it to
> > > > > > > kvm_cpu_cap_init().  In-place conversion makes it all but impossible for KVM to
> > > > > > > check the alias, e.g. via guest_cpu_cap_has(), especially since the AF() macro
> > > > > > > doesn't set the bits in kvm_known_cpu_caps (if/when a non-hacky validation of
> > > > > > > usage becomes reality).
> > > > > 
> > > > > Could you elaborate on this as well?
> > > > > 
> > > > > My suggestion was that we can just treat aliases as completely independent
> > > > > and dummy features, say KVM_X86_FEATURE_FPU_ALIAS, and pass them as is to the
> > > > > guest, which means that if an alias is present in host cpuid, it appears in
> > > > > kvm caps, and thus qemu can then set it in guest cpuid.
> > > > > 
> > > > > I don't think that we need any special treatment for them if you look at it
> > > > > this way.  If you don't agree, can you give me an example?
> > > 
> > > KVM doesn't honor the aliases beyond telling userspace they can be set (see below
> > > for all the aliased features that KVM _should_ be checking).  The APM clearly
> > > states that the features are the same as their CPUID.0x1 counterparts, but Intel
> > > CPUs don't support the aliases.  So, as you also note below, I think we could
> > > unequivocally say that enumerating the aliases but not the "real" features is a
> > > bogus CPUID model, but we can't say the opposite, i.e. the real features can
> > > exists without the aliases.
> > > 
> > > And that means that KVM must never query the aliases, e.g. should never do
> > > guest_cpu_cap_has(KVM_X86_FEATURE_FPU_ALIAS), because the result is essentially
> > > meaningless.  It's a small thing, but if KVM_X86_FEATURE_FPU_ALIAS simply doesn't
> > > exist, i.e. we do in-place conversion, then it's impossible to feed the aliases
> > > into things like guest_cpu_cap_has().
> 
> This only makes my case stronger - treating the aliases as just features will
> allow us to avoid adding more logic to code which is already too complex IMHO.
> 
> If your concern is that features could be queried by guest_cpu_cap_has()
> that is easy to fix, we can (and should) put them into a separate file and
> #include them only in cpuid.c.
> 
> We can even #undef the __X86_FEATURE_8000_0001_ALIAS macro after the kvm_set_cpu_caps,
> then if I understand the macro pre-processor correctly, any use of feature alias
> macros will not fully evaluate and cause a compile error.

I don't see how that's less code.  Either way, KVM needs a macro to handle aliases,
e.g. either we end up with ALIAS_F() or __X86_FEATURE_8000_0001_ALIAS().  For the
macros themselves, IMO they carry the same amount of complexity.

If we go with ALIASED_F() (or ALIASED_8000_0001_F()), then that macro is all that
is needed, and it's bulletproof.  E.g. there is no KVM_X86_FEATURE_FPU_ALIAS that
can be queried, and thus no need to be ensure it's defined in cpuid.c and #undef'd
after its use.

Hmm, I supposed we could harden the aliased feature usage in the same way as the
ALIASED_F(), e.g.

  #define __X86_FEATURE_8000_0001_ALIAS(feature)				\
  ({										\
	BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);	\
	BUILD_BUG_ON(kvm_cpu_cap_init_in_progress != CPUID_8000_0001_EDX);	\
	(feature + (CPUID_8000_0001_EDX - CPUID_1_EDX) * 32);			\
  })

If something tries to use an X86_FEATURE_*_ALIAS outside if kvm_cpu_cap_init(),
it would need to define and set kvm_cpu_cap_init_in_progress, i.e. would really
have to try to mess up.

Effectively the only differences are that KVM would have ~10 or so more lines of
code to define the X86_FEATURE_*_ALIAS macros, and that the usage would look like:

	VIRTUALIZED_F(FPU_ALIAS)

versus

	ALIASED_F(FPU)

At that point, I'm ok with defining each alias, though I honestly still don't
understand the motivation for defining single-use macros.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features
  2024-08-05 22:00             ` Sean Christopherson
@ 2024-09-10 20:37               ` Maxim Levitsky
  2024-09-11 15:37                 ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-09-10 20:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, 2024-08-05 at 15:00 -0700, Sean Christopherson wrote:
> On Mon, Aug 05, 2024, mlevitsk@redhat.com wrote:
> > У чт, 2024-07-25 у 11:39 -0700, Sean Christopherson пише:
> > > > On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> > > > > > On Mon, 2024-07-08 at 14:08 -0700, Sean Christopherson wrote:
> > > > > > > > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > > > > > > > > What if we defined the aliased features instead.
> > > > > > > > > > Something like this:
> > > > > > > > > > 
> > > > > > > > > > #define __X86_FEATURE_8000_0001_ALIAS(feature) \
> > > > > > > > > >         (feature + (CPUID_8000_0001_EDX - CPUID_1_EDX) * 32)
> > > > > > > > > > 
> > > > > > > > > > #define KVM_X86_FEATURE_FPU_ALIAS       __X86_FEATURE_8000_0001_ALIAS(KVM_X86_FEATURE_FPU)
> > > > > > > > > > #define KVM_X86_FEATURE_VME_ALIAS       __X86_FEATURE_8000_0001_ALIAS(KVM_X86_FEATURE_VME)
> > > > > > > > > > 
> > > > > > > > > > And then just use for example the 'F(FPU_ALIAS)' in the CPUID_8000_0001_EDX
> > > > > > > > 
> > > > > > > > At first glance, I really liked this idea, but after working through the
> > > > > > > > ramifications, I think I prefer "converting" the flag when passing it to
> > > > > > > > kvm_cpu_cap_init().  In-place conversion makes it all but impossible for KVM to
> > > > > > > > check the alias, e.g. via guest_cpu_cap_has(), especially since the AF() macro
> > > > > > > > doesn't set the bits in kvm_known_cpu_caps (if/when a non-hacky validation of
> > > > > > > > usage becomes reality).
> > > > > > 
> > > > > > Could you elaborate on this as well?
> > > > > > 
> > > > > > My suggestion was that we can just treat aliases as completely independent
> > > > > > and dummy features, say KVM_X86_FEATURE_FPU_ALIAS, and pass them as is to the
> > > > > > guest, which means that if an alias is present in host cpuid, it appears in
> > > > > > kvm caps, and thus qemu can then set it in guest cpuid.
> > > > > > 
> > > > > > I don't think that we need any special treatment for them if you look at it
> > > > > > this way.  If you don't agree, can you give me an example?
> > > > 
> > > > KVM doesn't honor the aliases beyond telling userspace they can be set (see below
> > > > for all the aliased features that KVM _should_ be checking).  The APM clearly
> > > > states that the features are the same as their CPUID.0x1 counterparts, but Intel
> > > > CPUs don't support the aliases.  So, as you also note below, I think we could
> > > > unequivocally say that enumerating the aliases but not the "real" features is a
> > > > bogus CPUID model, but we can't say the opposite, i.e. the real features can
> > > > exists without the aliases.
> > > > 
> > > > And that means that KVM must never query the aliases, e.g. should never do
> > > > guest_cpu_cap_has(KVM_X86_FEATURE_FPU_ALIAS), because the result is essentially
> > > > meaningless.  It's a small thing, but if KVM_X86_FEATURE_FPU_ALIAS simply doesn't
> > > > exist, i.e. we do in-place conversion, then it's impossible to feed the aliases
> > > > into things like guest_cpu_cap_has().
> > 
> > This only makes my case stronger - treating the aliases as just features will
> > allow us to avoid adding more logic to code which is already too complex IMHO.
> > 
> > If your concern is that features could be queried by guest_cpu_cap_has()
> > that is easy to fix, we can (and should) put them into a separate file and
> > #include them only in cpuid.c.
> > 
> > We can even #undef the __X86_FEATURE_8000_0001_ALIAS macro after the kvm_set_cpu_caps,
> > then if I understand the macro pre-processor correctly, any use of feature alias
> > macros will not fully evaluate and cause a compile error.
> 
> I don't see how that's less code.  Either way, KVM needs a macro to handle aliases,
> e.g. either we end up with ALIAS_F() or __X86_FEATURE_8000_0001_ALIAS().  For the
> macros themselves, IMO they carry the same amount of complexity.
> 
> If we go with ALIASED_F() (or ALIASED_8000_0001_F()), then that macro is all that
> is needed, and it's bulletproof.  E.g. there is no KVM_X86_FEATURE_FPU_ALIAS that
> can be queried, and thus no need to be ensure it's defined in cpuid.c and #undef'd
> after its use.
> 
> Hmm, I supposed we could harden the aliased feature usage in the same way as the
> ALIASED_F(), e.g.
> 
>   #define __X86_FEATURE_8000_0001_ALIAS(feature)				\
>   ({										\
> 	BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);	\
> 	BUILD_BUG_ON(kvm_cpu_cap_init_in_progress != CPUID_8000_0001_EDX);	\
> 	(feature + (CPUID_8000_0001_EDX - CPUID_1_EDX) * 32);			\
>   })
> 
> If something tries to use an X86_FEATURE_*_ALIAS outside if kvm_cpu_cap_init(),
> it would need to define and set kvm_cpu_cap_init_in_progress, i.e. would really
> have to try to mess up.
> 
> Effectively the only differences are that KVM would have ~10 or so more lines of
> code to define the X86_FEATURE_*_ALIAS macros, and that the usage would look like:
> 
> 	VIRTUALIZED_F(FPU_ALIAS)
> 
> versus
> 
> 	ALIASED_F(FPU)


This is exactly my point. I want to avoid profiliation of the _F macros, because
later, we will need to figure out what each of them (e.g ALIASED_F) does.

A whole leaf alias, is once in x86 arch life misfeature, and it is very likely that
Intel/AMD won't add more such aliases.

Why VIRTUALIZED_F though, it wasn't in the patch series? Normal F() should be enough
IMHO.


> 
> At that point, I'm ok with defining each alias, though I honestly still don't
> understand the motivation for defining single-use macros.
> 

The idea is that nobody will need to look at these macros (e.g__X86_FEATURE_8000_0001_ALIAS() and its usages), 
because it's clear what they do, they just define few extra CPUID features 
that nobody really cares about.

ALIASED_F() on the other hand is yet another _F macro() and we will need,
once again and again to figure out why it is there, what it does, etc.

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 24/49] KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions
  2024-08-05 21:35             ` Sean Christopherson
@ 2024-09-10 20:37               ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-09-10 20:37 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo,
	Borislav Petkov

On Mon, 2024-08-05 at 14:35 -0700, Sean Christopherson wrote:
> +Boris
> 
> On Mon, Aug 05, 2024, mlevitsk@redhat.com wrote:
> > У пт, 2024-07-26 у 16:34 -0700, Sean Christopherson пише:
> > > > On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> > > > > > On Mon, 2024-07-08 at 14:29 -0700, Sean Christopherson wrote:
> > > > > > > > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > > > > > > > > Maybe we should instead rename the SPEC_CTRL_SSBD to
> > > > > > > > > > 'MSR_IA32_SPEC_CTRL_SSBD' and together with it, other fields of this msr.  It
> > > > > > > > > > seems that at least some msrs in this file do this.
> > > > > > > > 
> > > > > > > > Yeah, the #undef hack is quite ugly.  But I didn't (and still don't) want to
> > > > > > > > introduce all the renaming churn in the middle of this already too-big series,
> > > > > > > > especially since it would require touching quite a bit of code outside of KVM.
> > > > > > > > I'm also not sure that's the right thing to do; I kinda feel like KVM is the one
> > > > > > > > that's being silly here.
> > > > > > 
> > > > > > I don't think that KVM is silly here. I think that hardware definitions like
> > > > > > MSRs, register names, register bit fields, etc, *must* come with a unique
> > > > > > prefix, it's not an issue of breaking some deeply nested macro, but rather an
> > > > > > issue of readability.
> > > > 
> > > > For the MSR names themselves, yes, I agree 100%.  But for the bits and mask, I
> > > > disagree.  It's simply too verbose, especially given that in the vast majority
> > > > of cases simply looking at the surrounding code will provide enough context to
> > > > glean an understanding of what's going on.
> > 
> > I am not that sure about this, especially if someone by mistake uses a flag
> > that belong to one MSR, in some unrelated place. Verbose code is rarely a bad thing.
> > 
> > 
> > > >   E.g. even for SPEC_CTRL_SSBD, where
> > > > there's an absurd amount of magic and layering, looking at the #define makes
> > > > it fairly obvious that it belongs to MSR_IA32_SPEC_CTRL.
> > > > 
> > > > And for us x86 folks, who obviously look at this code far more often than non-x86
> > > > folks, I find it valuable to know that a bit/mask is exactly that, and _not_ an
> > > > MSR index.  E.g. VMX_BASIC_TRUE_CTLS is a good example, where renaming that to
> > > > MSR_VMX_BASIC_TRUE_CTLS would make it look too much like MSR_IA32_VMX_TRUE_ENTRY_CTLS
> > > > and all the other "true" VMX MSRs.
> > > > 
> > > > > > SPEC_CTRL_SSBD for example won't mean much to someone who only knows ARM, while
> > > > > > MSR_SPEC_CTRL_SSBD, or even better IA32_MSR_SPEC_CTRL_SSBD, lets you instantly know
> > > > > > that this is a MSR, and anyone with even a bit of x86 knowledge should at least have
> > > > > > heard about what a MSR is.
> > > > > > 
> > > > > > In regard to X86_FEATURE_INTEL_SSBD, I don't oppose this idea, because we have
> > > > > > X86_FEATURE_AMD_SSBD, but in general I do oppose the idea of adding 'INTEL' prefix,
> > > > 
> > > > Ya, those are my feelings exactly.  And in this case, since we already have an
> > > > AMD variant, I think it's actually a net positive to add an INTEL variant so that
> > > > it's clear that Intel and AMD ended up defining separate CPUID to enumerate the
> > > > same basic info.
> > > > 
> > > > > > because it sets a not that good precedent, because most of the features on x86
> > > > > > are first done by Intel, but then are also implemented by AMD, and thus an intel-only
> > > > > > feature name can stick after it becomes a general x86 feature.
> > > > > > 
> > > > > > IN case of X86_FEATURE_INTEL_SSBD, we already have sadly different CPUID bits for
> > > > > > each vendor (although I wonder if AMD also sets the X86_FEATURE_INTEL_SSBD).
> > > > > > 
> > > > > > I vote to rename 'SPEC_CTRL_SSBD', it can be done as a standalone patch, and can
> > > > > > be accepted right now, even before this patch series is accepted.
> > > > 
> > > > If we go that route, then we also need to rename nearly ever bit/mask definition
> > > > in msr-index.h, otherwise SPEC_CTRL_* will be the odd ones out.  And as above, I
> > > > don't think this is the right direction.
> > 
> > Honestly not really. If you look carefully at the file, many bits are already defined
> > in the way I suggest, for example:
> > 
> > MSR_PLATFORM_INFO_CPUID_FAULT_BIT
> > MSR_IA32_POWER_CTL_BIT_EE
> > MSR_INTEGRITY_CAPS_ARRAY_BIST_BIT
> > MSR_AMD64_DE_CFG_LFENCE_SERIALIZE_BIT
> 
> Heh, I know there are some bits that have an "MSR" prefix, hence "nearly every".
> 
> > This file has all kind of names for both msrs and flags. There is not much
> > order, so renaming the bit definitions of IA32_SPEC_CTRL won't increase the
> > level of disorder in this file IMHO.
> 
> It depends on what direction msr-index.h is headed.  If the long-term preference
> is to have bits/masks namespaced with only their associated MSR name, i.e. no
> explicit MSR_, then renaming the bits is counter-productive.
> 
> I added Boris, who I believe was the most opinionated about the MSR bit names,
> i.e. who can most likely give us the closest thing to an authoritative answer as
> to the preferred style.
> 
> Boris, we're debating about the best way to solve a weird collision between:
> 
>   #define SPEC_CTRL_SSBD
> 
> and
> 
>   #define X86_FEATURE_SPEC_CTRL_SSBD
> 
> KVM wants to use its CPUID macros to essentially do:
> 
>   #define F(name) (X86_FEATURE_##name)
> 
> as a shorthand for X86_FEATURE_SPEC_CTRL_SSBD, but that can cause build failures
> depending on how KVM's macros are layered.  E.g. SPEC_CTRL_SSBD can get resolved
> to its value prior to token concatentation and result in KVM effectively generating
> X86_FEATURE_BIT(SPEC_CTRL_SSBD_SHIFT).
> 
> One of the proposed solutions is to rename all of the SPEC_CTRL_* bit definitions
> to add a MSR_ prefix, e.g. to generate MSR_SPEC_CTRL_SSBD and avoid the conflict.
> My recollection from the IA32_FEATURE_CONTROL rework a few years back is that you
> wanted to prioritize shorter names over having everything namespaced with MSR_,
> i.e. that this approach is a non-starter.
> 

Hi,

Any update on this?

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software
  2024-08-05 19:59             ` Sean Christopherson
@ 2024-09-10 20:41               ` Maxim Levitsky
  2024-09-11 16:03                 ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-09-10 20:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, 2024-08-05 at 12:59 -0700, Sean Christopherson wrote:
> On Mon, Aug 05, 2024, mlevitsk@redhat.com wrote:
> > У пт, 2024-07-26 у 17:06 -0700, Sean Christopherson пише:
> > > > > > And kvm_cpu_cap_init_begin, can set some cap_in_progress variable.
> > > > 
> > > > Ya, but then compile-time asserts become run-time asserts.
> > 
> > Not really, it all can be done with macros, in exactly the same way IMHO,
> > we do have BUILD_BUG_ON after all.
> > 
> > I am not against using macros, I am only against collecting a bitmask
> > while applying various side effects, and then passing the bitmask to
> > the kvm_cpu_cap_init.
> 
> Gah, I wasn't grokking that, obviously.  Sorry for not catching on earlier.
> 
> > > > To provide equivalent functionality, we also would need to pass in extra
> > > > state to begin/end() (as mentioned earlier).
> > 
> > Besides the number of leaf currently initialized, I don't see which other
> > extra state we need.
> > 
> > In fact I can prove that this is possible:
> > 
> > Roughly like this:
> > 
> > #define kvm_cpu_cap_init_begin(leaf)							\
> > do {											\
> >  const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf; 				\
> >  u32 kvm_cpu_cap_emulated = 0; 								\
> >  u32 kvm_cpu_cap_synthesized = 0; 							\
> > 	u32 kvm_cpu_cap_regular = 0;
> 
> Maybe "virtualized" instead of "regular"?
> 
> > #define feature_scattered(name) 							\
> >  BUILD_BUG_ON(X86_FEATURE_##name >= MAX_CPU_FEATURES); 					\
> >  KVM_VALIDATE_CPU_CAP_USAGE(name); 							\
> > 											\
> > 	if (boot_cpu_has(X86_FEATURE_##name) 						\
> > 		kvm_cpu_cap_regular |= feature_bit(name);
> > 
> > 
> > #define kvm_cpu_cap_init_end() 								\
> > 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);			\
> > 											\
> > 	if (kvm_cpu_cap_init_in_progress < NCAPINTS) 					\
> >  		kvm_cpu_caps[kvm_cpu_cap_init_in_progress] &= kvm_cpu_cap_regular; 	\
> >  	else 										\
> >  		kvm_cpu_caps[kvm_cpu_cap_init_in_progress] = kvm_cpu_cap_regular; 	\
> >  											\
> >  	kvm_cpu_caps[kvm_cpu_cap_init_in_progress] &= (raw_cpuid_get(cpuid) | 		\
> >  	kvm_cpu_cap_synthesized); 							\
> >  	kvm_cpu_caps[kvm_cpu_cap_init_in_progress] |= kvm_cpu_cap_emulated; 		\
> > } while(0);
> > 
> > 
> > And now we have:
> > 
> > kvm_cpu_cap_init_begin(CPUID_12_EAX);
> >  feature_scattered(SGX1);
> >  feature_scattered(SGX2);
> >  feature_scattered(SGX_EDECCSSA);
> > kvm_cpu_cap_init_end();
> 
> I don't love the syntax (mainly the need for a begin()+end()), but I'm a-ok
> getting rid of the @mask param/input.
> 
> What about making kvm_cpu_cap_init() a variadic macro, with the relevant features
> "unpacked" in the context of the macro?  That would avoid the need for a trailing
> macro, and would provide a clear indication of when/where the set of features is
> "initialized".
> 
> The biggest downside I see is that the last entry can't have a trailing comma,
> i.e. adding a new feature would require updating the previous feature too.
> 
> #define kvm_cpu_cap_init(leaf, init_features...)			\
> do {									\
> 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
> 	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
> 	u32 kvm_cpu_cap_virtualized= 0;					\
> 	u32 kvm_cpu_cap_emulated = 0;					\
> 	u32 kvm_cpu_cap_synthesized = 0;				\
> 									\
> 	init_features;							\
> 									\
> 	kvm_cpu_caps[leaf] = kvm_cpu_cap_virtualized;			\
> 	kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |			\
> 			       kvm_cpu_cap_synthesized);		\
> 	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
> } while (0)
> 
> 	kvm_cpu_cap_init(CPUID_1_ECX,
> 		VIRTUALIZED_F(XMM3),
> 		VIRTUALIZED_F(PCLMULQDQ),
> 		VIRTUALIZED_F(SSSE3),
> 		VIRTUALIZED_F(FMA),
> 		VIRTUALIZED_F(CX16),
> 		VIRTUALIZED_F(PDCM),
> 		VIRTUALIZED_F(PCID),
> 		VIRTUALIZED_F(XMM4_1),
> 		VIRTUALIZED_F(XMM4_2),
> 		EMULATED_F(X2APIC),
> 		VIRTUALIZED_F(MOVBE),
> 		VIRTUALIZED_F(POPCNT),
> 		EMULATED_F(TSC_DEADLINE_TIMER),
> 		VIRTUALIZED_F(AES),
> 		VIRTUALIZED_F(XSAVE),
> 		// DYNAMIC_F(OSXSAVE),
> 		VIRTUALIZED_F(AVX),
> 		VIRTUALIZED_F(F16C),
> 		VIRTUALIZED_F(RDRAND),
> 		EMULATED_F(HYPERVISOR)
> 	);

Hi,

This is no doubt better than using '|'.

I still strongly prefer my version, because I don't really like the fact that _F 
macros have side effects, and yet passed as parameters to the kvm_cpu_cap_init function/macro.

Basically an unwritten rule, which I consider very important and because of which
I raised my concerns over this patch series is that if a function has side effects,
it should not be used as a parameter to another function, instead, it should be 
called explicitly on its own.

If you strongly prefer the variadic macro over my begin/end API, I can live with
that though, it is still better than '|'ing a mask with functions that have side
effects.

Best regards,
	Maxim Levitsky


> 
> 
> Alternatively, we could force a trailing comma by omitting the semicolon after
> init_features, but that looks weird for the the macro itself, and arguably a bit
> weird for the users too.
> 



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 44/49] KVM: x86: Update guest cpu_caps at runtime for dynamic CPUID-based features
  2024-07-09  0:24     ` Sean Christopherson
@ 2024-09-10 20:41       ` Maxim Levitsky
  2024-09-11 15:41         ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-09-10 20:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Mon, 2024-07-08 at 17:24 -0700, Sean Christopherson wrote:
> On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > -		cpuid_entry_change(best, X86_FEATURE_OSPKE,
> > > -				   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
> > > +		kvm_update_feature_runtime(vcpu, best, X86_FEATURE_OSPKE,
> > > +					   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
> > > +
> > >  
> > >  	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 0);
> > >  	if (best)
> > 
> > I am not 100% sure that we need to do this.
> > 
> > Runtime cpuid changes are a hack that Intel did back then, due to various
> > reasons, These changes don't really change the feature set that CPU supports,
> > but merly as you like to say 'massage' the output of the CPUID instruction to
> > make the unmodified OS happy usually.
> > 
> > Thus it feels to me that CPU caps should not include the dynamic features,
> > and neither KVM should use the value of these as a source for truth, but
> > rather the underlying source of the truth (e.g CR4).
> > 
> > But if you insist, I don't really have a very strong reason to object this.
> 
> FWIW, I think I agree that CR4 should be the source of truth, but it's largely a
> moot point because KVM doesn't actually check OSXSAVE or OSPKE, as KVM never
> emulates the relevant instructions.  So for those, it's indeed not strictly
> necessary.
> 
> Unfortunately, KVM has established ABI for checking X86_FEATURE_MWAIT when
> "emulating" MONITOR and MWAIT, i.e. KVM can't use vcpu->arch.ia32_misc_enable_msr
> as the source of truth.

Can you elaborate on this? Can you give me an example of the ABI?


>   So for MWAIT, KVM does need to update CPU caps (or carry
> even more awful MWAIT code), at which point extending the behavior to the CR4
> features (and to X86_FEATURE_APIC) is practically free.
> 


Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features
  2024-09-10 20:37               ` Maxim Levitsky
@ 2024-09-11 15:37                 ` Sean Christopherson
  2024-11-22  3:17                   ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-09-11 15:37 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Tue, Sep 10, 2024, Maxim Levitsky wrote:
> On Mon, 2024-08-05 at 15:00 -0700, Sean Christopherson wrote:
> > If we go with ALIASED_F() (or ALIASED_8000_0001_F()), then that macro is all that
> > is needed, and it's bulletproof.  E.g. there is no KVM_X86_FEATURE_FPU_ALIAS that
> > can be queried, and thus no need to be ensure it's defined in cpuid.c and #undef'd
> > after its use.
> > 
> > Hmm, I supposed we could harden the aliased feature usage in the same way as the
> > ALIASED_F(), e.g.
> > 
> >   #define __X86_FEATURE_8000_0001_ALIAS(feature)				\
> >   ({										\
> > 	BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);	\
> > 	BUILD_BUG_ON(kvm_cpu_cap_init_in_progress != CPUID_8000_0001_EDX);	\
> > 	(feature + (CPUID_8000_0001_EDX - CPUID_1_EDX) * 32);			\
> >   })
> > 
> > If something tries to use an X86_FEATURE_*_ALIAS outside if kvm_cpu_cap_init(),
> > it would need to define and set kvm_cpu_cap_init_in_progress, i.e. would really
> > have to try to mess up.
> > 
> > Effectively the only differences are that KVM would have ~10 or so more lines of
> > code to define the X86_FEATURE_*_ALIAS macros, and that the usage would look like:
> > 
> > 	VIRTUALIZED_F(FPU_ALIAS)
> > 
> > versus
> > 
> > 	ALIASED_F(FPU)
> 
> 
> This is exactly my point. I want to avoid profiliation of the _F macros, because
> later, we will need to figure out what each of them (e.g ALIASED_F) does.
> 
> A whole leaf alias, is once in x86 arch life misfeature, and it is very likely that
> Intel/AMD won't add more such aliases.
> 
> Why VIRTUALIZED_F though, it wasn't in the patch series? Normal F() should be enough
> IMHO.

I'm a-ok with F(), I simply thought there was a desire for more verbosity across
the board.

> > At that point, I'm ok with defining each alias, though I honestly still don't
> > understand the motivation for defining single-use macros.
> > 
> 
> The idea is that nobody will need to look at these macros
> (e.g__X86_FEATURE_8000_0001_ALIAS() and its usages), because it's clear what
> they do, they just define few extra CPUID features that nobody really cares
> about.
> 
> ALIASED_F() on the other hand is yet another _F macro() and we will need,
> once again and again to figure out why it is there, what it does, etc.

That seems easily solved by naming the macro ALIASED_8000_0001_F().  I don't see
how that's any less clear than __X86_FEATURE_8000_0001_ALIAS(), and as above,
there are several advantages to defining the alias in the context of the leaf
builder.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 44/49] KVM: x86: Update guest cpu_caps at runtime for dynamic CPUID-based features
  2024-09-10 20:41       ` Maxim Levitsky
@ 2024-09-11 15:41         ` Sean Christopherson
  2024-11-22  2:11           ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-09-11 15:41 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Tue, Sep 10, 2024, Maxim Levitsky wrote:
> On Mon, 2024-07-08 at 17:24 -0700, Sean Christopherson wrote:
> > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > > -		cpuid_entry_change(best, X86_FEATURE_OSPKE,
> > > > -				   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
> > > > +		kvm_update_feature_runtime(vcpu, best, X86_FEATURE_OSPKE,
> > > > +					   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
> > > > +
> > > >  
> > > >  	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 0);
> > > >  	if (best)
> > > 
> > > I am not 100% sure that we need to do this.
> > > 
> > > Runtime cpuid changes are a hack that Intel did back then, due to various
> > > reasons, These changes don't really change the feature set that CPU supports,
> > > but merly as you like to say 'massage' the output of the CPUID instruction to
> > > make the unmodified OS happy usually.
> > > 
> > > Thus it feels to me that CPU caps should not include the dynamic features,
> > > and neither KVM should use the value of these as a source for truth, but
> > > rather the underlying source of the truth (e.g CR4).
> > > 
> > > But if you insist, I don't really have a very strong reason to object this.
> > 
> > FWIW, I think I agree that CR4 should be the source of truth, but it's largely a
> > moot point because KVM doesn't actually check OSXSAVE or OSPKE, as KVM never
> > emulates the relevant instructions.  So for those, it's indeed not strictly
> > necessary.
> > 
> > Unfortunately, KVM has established ABI for checking X86_FEATURE_MWAIT when
> > "emulating" MONITOR and MWAIT, i.e. KVM can't use vcpu->arch.ia32_misc_enable_msr
> > as the source of truth.
> 
> Can you elaborate on this? Can you give me an example of the ABI?

Writes to MSR_IA32_MISC_ENABLE are guarded with a quirk:

		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
		    ((old_val ^ data)  & MSR_IA32_MISC_ENABLE_MWAIT)) {
			if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
				return 1;
			vcpu->arch.ia32_misc_enable_msr = data;
			kvm_update_cpuid_runtime(vcpu);
		} else {
			vcpu->arch.ia32_misc_enable_msr = data;
		}

as is enforcement of #UD on MONITOR/MWAIT.

  static int kvm_emulate_monitor_mwait(struct kvm_vcpu *vcpu, const char *insn)
  {
	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS) &&
	    !guest_cpuid_has(vcpu, X86_FEATURE_MWAIT))
		return kvm_handle_invalid_op(vcpu);

	pr_warn_once("%s instruction emulated as NOP!\n", insn);
	return kvm_emulate_as_nop(vcpu);
  }

If KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT is enabled but KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS
is _disabled_, then KVM's ABI is to honor X86_FEATURE_MWAIT regardless of what
is in vcpu->arch.ia32_misc_enable_msr (because userspace owns X86_FEATURE_MWAIT
in that scenario).

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software
  2024-09-10 20:41               ` Maxim Levitsky
@ 2024-09-11 16:03                 ` Sean Christopherson
  2024-11-22  3:28                   ` Maxim Levitsky
  0 siblings, 1 reply; 185+ messages in thread
From: Sean Christopherson @ 2024-09-11 16:03 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Tue, Sep 10, 2024, Maxim Levitsky wrote:
> On Mon, 2024-08-05 at 12:59 -0700, Sean Christopherson wrote:
> > > And now we have:
> > > 
> > > kvm_cpu_cap_init_begin(CPUID_12_EAX);
> > >  feature_scattered(SGX1);
> > >  feature_scattered(SGX2);
> > >  feature_scattered(SGX_EDECCSSA);
> > > kvm_cpu_cap_init_end();
> > 
> > I don't love the syntax (mainly the need for a begin()+end()), but I'm a-ok
> > getting rid of the @mask param/input.
> > 
> > What about making kvm_cpu_cap_init() a variadic macro, with the relevant features
> > "unpacked" in the context of the macro?  That would avoid the need for a trailing
> > macro, and would provide a clear indication of when/where the set of features is
> > "initialized".
> > 
> > The biggest downside I see is that the last entry can't have a trailing comma,
> > i.e. adding a new feature would require updating the previous feature too.
> > 
> > #define kvm_cpu_cap_init(leaf, init_features...)			\
> > do {									\
> > 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
> > 	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
> > 	u32 kvm_cpu_cap_virtualized= 0;					\
> > 	u32 kvm_cpu_cap_emulated = 0;					\
> > 	u32 kvm_cpu_cap_synthesized = 0;				\
> > 									\
> > 	init_features;							\
> > 									\
> > 	kvm_cpu_caps[leaf] = kvm_cpu_cap_virtualized;			\
> > 	kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |			\
> > 			       kvm_cpu_cap_synthesized);		\
> > 	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
> > } while (0)
> > 
> > 	kvm_cpu_cap_init(CPUID_1_ECX,
> > 		VIRTUALIZED_F(XMM3),
> > 		VIRTUALIZED_F(PCLMULQDQ),
> > 		VIRTUALIZED_F(SSSE3),
> > 		VIRTUALIZED_F(FMA),
> > 		VIRTUALIZED_F(CX16),
> > 		VIRTUALIZED_F(PDCM),
> > 		VIRTUALIZED_F(PCID),
> > 		VIRTUALIZED_F(XMM4_1),
> > 		VIRTUALIZED_F(XMM4_2),
> > 		EMULATED_F(X2APIC),
> > 		VIRTUALIZED_F(MOVBE),
> > 		VIRTUALIZED_F(POPCNT),
> > 		EMULATED_F(TSC_DEADLINE_TIMER),
> > 		VIRTUALIZED_F(AES),
> > 		VIRTUALIZED_F(XSAVE),
> > 		// DYNAMIC_F(OSXSAVE),
> > 		VIRTUALIZED_F(AVX),
> > 		VIRTUALIZED_F(F16C),
> > 		VIRTUALIZED_F(RDRAND),
> > 		EMULATED_F(HYPERVISOR)
> > 	);
> 
> Hi,
> 
> This is no doubt better than using '|'.
> 
> I still strongly prefer my version, because I don't really like the fact that
> _F macros have side effects, and yet passed as parameters to the
> kvm_cpu_cap_init function/macro.
> 
> Basically an unwritten rule, which I consider very important and because of which
> I raised my concerns over this patch series is that if a function has side effects,
> it should not be used as a parameter to another function, instead, it should be 
> called explicitly on its own.

Splitting hairs to some degree, but the above suggestion is distinctly different
than passing the _result_ of a function call as a parameter to another function.
The actual "call" happens within the body of kvm_cpu_cap_init().  

This is effectively the same as passing a function pointer to a helper, and that
function pointer implementation having side effects, which is quite common in the
kernel and KVM, e.g. msr_access_t, rmap_handler_t, tdp_handler_t, gfn_handler_t,
on_lock_fn_t, etc.

I 100% agree that it's unusual and subtle to essentially have a variable number
of function pointers, but I don't see it as being an inherently bad pattern,
especially since it is practically impossible to misuse _because_ the macro
unpacks the "calls" at compile time.

IMO, the part that is most gross is the macros operating on local variables, but
that behavior exists in all ideas we've discussed, probably because I'm pretty
sure it's unavoidable unless we do something even worse (way, waaaaay worse).

E.g. we could add 32 versions of kvm_cpu_cap_init() that invoke pairs of parameters
and pass in the variables

  fn1(f1, virtualized, emulated, synthesized)
  fn2(f2, virtualized, emulated, synthesized)
  fn3(f3, virtualized, emulated, synthesized)
  ...
  fnN(fN, virtualized, emulated, synthesized)

and

  kvm_cpu_cap_init19(CPUID_1_ECX,
	F, XMM3,
	F, PCLMULQDQ,
	F, SSE3,
	...
	EMULATED_F, HYPERVISOR
  );

But that's beyond horrific :-)

> If you strongly prefer the variadic macro over my begin/end API, I can live with
> that though, it is still better than '|'ing a mask with functions that have side
> effects.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 05/49] KVM: selftests: Assert that the @cpuid passed to get_cpuid_entry() is non-NULL
  2024-07-24 17:28       ` Maxim Levitsky
@ 2024-11-21 18:57         ` Sean Christopherson
  0 siblings, 0 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-11-21 18:57 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Wed, Jul 24, 2024, Maxim Levitsky wrote:
> On Mon, 2024-07-08 at 19:33 +0000, Sean Christopherson wrote:
> > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > On Fri, 2024-05-17 at 10:38 -0700, Sean Christopherson wrote:
> > > > Add a sanity check in get_cpuid_entry() to provide a friendlier error than
> > > > a segfault when a test developer tries to use a vCPU CPUID helper on a
> > > > barebones vCPU.
> > > > 
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > > ---
> > > >  tools/testing/selftests/kvm/lib/x86_64/processor.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > > 
> > > > diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > > > index c664e446136b..f0f3434d767e 100644
> > > > --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > > > +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> > > > @@ -1141,6 +1141,8 @@ const struct kvm_cpuid_entry2 *get_cpuid_entry(const struct kvm_cpuid2 *cpuid,
> > > >  {
> > > >  	int i;
> > > >  
> > > > +	TEST_ASSERT(cpuid, "Must do vcpu_init_cpuid() first (or equivalent)");
> > > > +
> > > >  	for (i = 0; i < cpuid->nent; i++) {
> > > >  		if (cpuid->entries[i].function == function &&
> > > >  		    cpuid->entries[i].index == index)
> > > 
> > > Hi,
> > > 
> > > Maybe it is better to do this assert in __vcpu_get_cpuid_entry() because the
> > > assert might confuse the reader, since it just tests for NULL but when it
> > > fails, it complains that you need to call some function.
> > 
> > IIRC, I originally added the assert in __vcpu_get_cpuid_entry(), but I didn't
> > like leaving get_cpuid_entry() unprotected.  What if I add an assert in both?
> > E.g. have __vcpu_get_cpuid_entry() assert with the (hopefully) hepful message,
> > and have get_cpuid_entry() do a simple TEST_ASSERT_NE()?
> > 
> 
> This looks like a great idea.

Circling back to this, I actually like your initial suggestion better.  Asserting
in get_cpuid_entry() is unnecessary paranoia, e.g. it's roughly equivalent to
asserting that any and all pointers are non-NULL.   The __vcpu_get_cpuid_entry()
assert though makes a lot more sense, because it's not all that obvious that
vcpu->cpuid is (usually) initialized elsewhere.

^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 44/49] KVM: x86: Update guest cpu_caps at runtime for dynamic CPUID-based features
  2024-09-11 15:41         ` Sean Christopherson
@ 2024-11-22  2:11           ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-11-22  2:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Wed, 2024-09-11 at 08:41 -0700, Sean Christopherson wrote:
> On Tue, Sep 10, 2024, Maxim Levitsky wrote:
> > On Mon, 2024-07-08 at 17:24 -0700, Sean Christopherson wrote:
> > > On Thu, Jul 04, 2024, Maxim Levitsky wrote:
> > > > On Fri, 2024-05-17 at 10:39 -0700, Sean Christopherson wrote:
> > > > > -		cpuid_entry_change(best, X86_FEATURE_OSPKE,
> > > > > -				   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
> > > > > +		kvm_update_feature_runtime(vcpu, best, X86_FEATURE_OSPKE,
> > > > > +					   kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE));
> > > > > +
> > > > >  
> > > > >  	best = kvm_find_cpuid_entry_index(vcpu, 0xD, 0);
> > > > >  	if (best)
> > > > 
> > > > I am not 100% sure that we need to do this.
> > > > 
> > > > Runtime cpuid changes are a hack that Intel did back then, due to various
> > > > reasons, These changes don't really change the feature set that CPU supports,
> > > > but merly as you like to say 'massage' the output of the CPUID instruction to
> > > > make the unmodified OS happy usually.
> > > > 
> > > > Thus it feels to me that CPU caps should not include the dynamic features,
> > > > and neither KVM should use the value of these as a source for truth, but
> > > > rather the underlying source of the truth (e.g CR4).
> > > > 
> > > > But if you insist, I don't really have a very strong reason to object this.
> > > 
> > > FWIW, I think I agree that CR4 should be the source of truth, but it's largely a
> > > moot point because KVM doesn't actually check OSXSAVE or OSPKE, as KVM never
> > > emulates the relevant instructions.  So for those, it's indeed not strictly
> > > necessary.
> > > 
> > > Unfortunately, KVM has established ABI for checking X86_FEATURE_MWAIT when
> > > "emulating" MONITOR and MWAIT, i.e. KVM can't use vcpu->arch.ia32_misc_enable_msr
> > > as the source of truth.
> > 
> > Can you elaborate on this? Can you give me an example of the ABI?
> 
> Writes to MSR_IA32_MISC_ENABLE are guarded with a quirk:
> 
> 		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
> 		    ((old_val ^ data)  & MSR_IA32_MISC_ENABLE_MWAIT)) {
> 			if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
> 				return 1;
> 			vcpu->arch.ia32_misc_enable_msr = data;
> 			kvm_update_cpuid_runtime(vcpu);
> 		} else {
> 			vcpu->arch.ia32_misc_enable_msr = data;
> 		}
> 
> as is enforcement of #UD on MONITOR/MWAIT.
> 
>   static int kvm_emulate_monitor_mwait(struct kvm_vcpu *vcpu, const char *insn)
>   {
> 	if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS) &&
> 	    !guest_cpuid_has(vcpu, X86_FEATURE_MWAIT))
> 		return kvm_handle_invalid_op(vcpu);
> 
> 	pr_warn_once("%s instruction emulated as NOP!\n", insn);
> 	return kvm_emulate_as_nop(vcpu);
>   }
> 
> If KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT is enabled but KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS
> is _disabled_, then KVM's ABI is to honor X86_FEATURE_MWAIT regardless of what
> is in vcpu->arch.ia32_misc_enable_msr (because userspace owns X86_FEATURE_MWAIT
> in that scenario).
> 

OK, makes sense.
Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features
  2024-09-11 15:37                 ` Sean Christopherson
@ 2024-11-22  3:17                   ` Maxim Levitsky
  2024-11-27 14:38                     ` Sean Christopherson
  0 siblings, 1 reply; 185+ messages in thread
From: Maxim Levitsky @ 2024-11-22  3:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Wed, 2024-09-11 at 08:37 -0700, Sean Christopherson wrote:
> On Tue, Sep 10, 2024, Maxim Levitsky wrote:
> > On Mon, 2024-08-05 at 15:00 -0700, Sean Christopherson wrote:
> > > If we go with ALIASED_F() (or ALIASED_8000_0001_F()), then that macro is all that
> > > is needed, and it's bulletproof.  E.g. there is no KVM_X86_FEATURE_FPU_ALIAS that
> > > can be queried, and thus no need to be ensure it's defined in cpuid.c and #undef'd
> > > after its use.
> > > 
> > > Hmm, I supposed we could harden the aliased feature usage in the same way as the
> > > ALIASED_F(), e.g.
> > > 
> > >   #define __X86_FEATURE_8000_0001_ALIAS(feature)				\
> > >   ({										\
> > > 	BUILD_BUG_ON(__feature_leaf(X86_FEATURE_##name) != CPUID_1_EDX);	\
> > > 	BUILD_BUG_ON(kvm_cpu_cap_init_in_progress != CPUID_8000_0001_EDX);	\
> > > 	(feature + (CPUID_8000_0001_EDX - CPUID_1_EDX) * 32);			\
> > >   })
> > > 
> > > If something tries to use an X86_FEATURE_*_ALIAS outside if kvm_cpu_cap_init(),
> > > it would need to define and set kvm_cpu_cap_init_in_progress, i.e. would really
> > > have to try to mess up.
> > > 
> > > Effectively the only differences are that KVM would have ~10 or so more lines of
> > > code to define the X86_FEATURE_*_ALIAS macros, and that the usage would look like:
> > > 
> > > 	VIRTUALIZED_F(FPU_ALIAS)
> > > 
> > > versus
> > > 
> > > 	ALIASED_F(FPU)
> > 
> > This is exactly my point. I want to avoid profiliation of the _F macros, because
> > later, we will need to figure out what each of them (e.g ALIASED_F) does.
> > 
> > A whole leaf alias, is once in x86 arch life misfeature, and it is very likely that
> > Intel/AMD won't add more such aliases.
> > 
> > Why VIRTUALIZED_F though, it wasn't in the patch series? Normal F() should be enough
> > IMHO.
> 
> I'm a-ok with F(), I simply thought there was a desire for more verbosity across
> the board.
> 
> > > At that point, I'm ok with defining each alias, though I honestly still don't
> > > understand the motivation for defining single-use macros.
> > > 
> > 
> > The idea is that nobody will need to look at these macros
> > (e.g__X86_FEATURE_8000_0001_ALIAS() and its usages), because it's clear what
> > they do, they just define few extra CPUID features that nobody really cares
> > about.
> > 
> > ALIASED_F() on the other hand is yet another _F macro() and we will need,
> > once again and again to figure out why it is there, what it does, etc.
> 
> That seems easily solved by naming the macro ALIASED_8000_0001_F().  I don't see
> how that's any less clear than __X86_FEATURE_8000_0001_ALIAS(), and as above,
> there are several advantages to defining the alias in the context of the leaf
> builder.
> 

Hi!

I am stating my point again: Treating 8000_0001 leaf aliases as regular CPUID features means that
we don't need common code to deal with this, and thus when someone reads the common code
(and this is the thing I care about the most) that someone won't need to dig up the info
about what these aliases are. 

I for example didn't knew about them because these aliases are basically a result of AMD redoing 
some things in the spec their way when they just released first 64-bit extensions.
I didn't follow the x86 ISA closely back then (I only had 32 bit systems to play with).

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software
  2024-09-11 16:03                 ` Sean Christopherson
@ 2024-11-22  3:28                   ` Maxim Levitsky
  0 siblings, 0 replies; 185+ messages in thread
From: Maxim Levitsky @ 2024-11-22  3:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Wed, 2024-09-11 at 09:03 -0700, Sean Christopherson wrote:
> On Tue, Sep 10, 2024, Maxim Levitsky wrote:
> > On Mon, 2024-08-05 at 12:59 -0700, Sean Christopherson wrote:
> > > > And now we have:
> > > > 
> > > > kvm_cpu_cap_init_begin(CPUID_12_EAX);
> > > >  feature_scattered(SGX1);
> > > >  feature_scattered(SGX2);
> > > >  feature_scattered(SGX_EDECCSSA);
> > > > kvm_cpu_cap_init_end();
> > > 
> > > I don't love the syntax (mainly the need for a begin()+end()), but I'm a-ok
> > > getting rid of the @mask param/input.
> > > 
> > > What about making kvm_cpu_cap_init() a variadic macro, with the relevant features
> > > "unpacked" in the context of the macro?  That would avoid the need for a trailing
> > > macro, and would provide a clear indication of when/where the set of features is
> > > "initialized".
> > > 
> > > The biggest downside I see is that the last entry can't have a trailing comma,
> > > i.e. adding a new feature would require updating the previous feature too.
> > > 
> > > #define kvm_cpu_cap_init(leaf, init_features...)			\
> > > do {									\
> > > 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);	\
> > > 	const u32 __maybe_unused kvm_cpu_cap_init_in_progress = leaf;	\
> > > 	u32 kvm_cpu_cap_virtualized= 0;					\
> > > 	u32 kvm_cpu_cap_emulated = 0;					\
> > > 	u32 kvm_cpu_cap_synthesized = 0;				\
> > > 									\
> > > 	init_features;							\
> > > 									\
> > > 	kvm_cpu_caps[leaf] = kvm_cpu_cap_virtualized;			\
> > > 	kvm_cpu_caps[leaf] &= (raw_cpuid_get(cpuid) |			\
> > > 			       kvm_cpu_cap_synthesized);		\
> > > 	kvm_cpu_caps[leaf] |= kvm_cpu_cap_emulated;			\
> > > } while (0)
> > > 
> > > 	kvm_cpu_cap_init(CPUID_1_ECX,
> > > 		VIRTUALIZED_F(XMM3),
> > > 		VIRTUALIZED_F(PCLMULQDQ),
> > > 		VIRTUALIZED_F(SSSE3),
> > > 		VIRTUALIZED_F(FMA),
> > > 		VIRTUALIZED_F(CX16),
> > > 		VIRTUALIZED_F(PDCM),
> > > 		VIRTUALIZED_F(PCID),
> > > 		VIRTUALIZED_F(XMM4_1),
> > > 		VIRTUALIZED_F(XMM4_2),
> > > 		EMULATED_F(X2APIC),
> > > 		VIRTUALIZED_F(MOVBE),
> > > 		VIRTUALIZED_F(POPCNT),
> > > 		EMULATED_F(TSC_DEADLINE_TIMER),
> > > 		VIRTUALIZED_F(AES),
> > > 		VIRTUALIZED_F(XSAVE),
> > > 		// DYNAMIC_F(OSXSAVE),
> > > 		VIRTUALIZED_F(AVX),
> > > 		VIRTUALIZED_F(F16C),
> > > 		VIRTUALIZED_F(RDRAND),
> > > 		EMULATED_F(HYPERVISOR)
> > > 	);
> > 
> > Hi,
> > 
> > This is no doubt better than using '|'.
> > 
> > I still strongly prefer my version, because I don't really like the fact that
> > _F macros have side effects, and yet passed as parameters to the
> > kvm_cpu_cap_init function/macro.
> > 
> > Basically an unwritten rule, which I consider very important and because of which
> > I raised my concerns over this patch series is that if a function has side effects,
> > it should not be used as a parameter to another function, instead, it should be 
> > called explicitly on its own.
> 
> Splitting hairs to some degree, but the above suggestion is distinctly different
> than passing the _result_ of a function call as a parameter to another function.
> The actual "call" happens within the body of kvm_cpu_cap_init().

You are technically right but you use a wrong point of view: You know the implementation,
and I pretend that I don't know it, and try to look at this from the point of view
of someone who just looks a the code for the first time, e.g. to fix some bugs.

Someone who doesn't know anything about this, won't know if these are macros, cleverly
passed to another variadric macro (which is itself a feature that is not often used)

I just state the fact: a function or what looks like a function, result of which
is evaluated in expression or passed to another function (within a single statement)
should not have side effects. 
Only top level function/procedure calls allowed to have side effects - 
otherwise this is just confusing.

Let me explain this again with code:

When I see for the first time this:

result = foo(x) | bar(x);

I strongly expect both foo and bar to be pure functions with no side effects.

Or if I see this for the first time:

err = somefunc(foo(x), bar(x));

I also expect that foo and bar are pure functions,
but 'somefunc' might not be because it only returns an error code,
and it is a top level statement.

And I don't care if this is implemented with functions or macros, because it
looks the same.

This is just how my common sense works.

I won't argue though more about this, I don't want to bikeshed this and block this patch series.
If you insist, let it be, but please at least use the variadic macro.


> 
> This is effectively the same as passing a function pointer to a helper, and that
> function pointer implementation having side effects, which is quite common in the
> kernel and KVM, e.g. msr_access_t, rmap_handler_t, tdp_handler_t, gfn_handler_t,
> on_lock_fn_t, etc.
> 
> I 100% agree that it's unusual and subtle to essentially have a variable number
> of function pointers, but I don't see it as being an inherently bad pattern,
> especially since it is practically impossible to misuse _because_ the macro
> unpacks the "calls" at compile time.
> 
> IMO, the part that is most gross is the macros operating on local variables, but
> that behavior exists in all ideas we've discussed, probably because I'm pretty
> sure it's unavoidable unless we do something even worse (way, waaaaay worse).
> 
> E.g. we could add 32 versions of kvm_cpu_cap_init() that invoke pairs of parameters
> and pass in the variables
> 
>   fn1(f1, virtualized, emulated, synthesized)
>   fn2(f2, virtualized, emulated, synthesized)
>   fn3(f3, virtualized, emulated, synthesized)
>   ...
>   fnN(fN, virtualized, emulated, synthesized)
> 
> and
> 
>   kvm_cpu_cap_init19(CPUID_1_ECX,
> 	F, XMM3,
> 	F, PCLMULQDQ,
> 	F, SSE3,
> 	...
> 	EMULATED_F, HYPERVISOR
>   );

I don't think that this change is worth it, but this is still better in some sense,
because at least the user won't be able to make any assumptions about the above,
and instead will have to read the code and figure out what was done here.
It won't be easy though.

> 
> But that's beyond horrific :-)
> 
> > If you strongly prefer the variadic macro over my begin/end API, I can live with
> > that though, it is still better than '|'ing a mask with functions that have side
> > effects.



Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 185+ messages in thread

* Re: [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features
  2024-11-22  3:17                   ` Maxim Levitsky
@ 2024-11-27 14:38                     ` Sean Christopherson
  0 siblings, 0 replies; 185+ messages in thread
From: Sean Christopherson @ 2024-11-27 14:38 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Paolo Bonzini, Vitaly Kuznetsov, kvm, linux-kernel, Hou Wenlong,
	Kechen Lu, Oliver Upton, Binbin Wu, Yang Weijiang, Robert Hoo

On Thu, Nov 21, 2024, Maxim Levitsky wrote:
> On Wed, 2024-09-11 at 08:37 -0700, Sean Christopherson wrote:
> > On Tue, Sep 10, 2024, Maxim Levitsky wrote:
> > > On Mon, 2024-08-05 at 15:00 -0700, Sean Christopherson wrote:
> > > > At that point, I'm ok with defining each alias, though I honestly still don't
> > > > understand the motivation for defining single-use macros.
> > > > 
> > > 
> > > The idea is that nobody will need to look at these macros
> > > (e.g__X86_FEATURE_8000_0001_ALIAS() and its usages), because it's clear what
> > > they do, they just define few extra CPUID features that nobody really cares
> > > about.
> > > 
> > > ALIASED_F() on the other hand is yet another _F macro() and we will need,
> > > once again and again to figure out why it is there, what it does, etc.
> > 
> > That seems easily solved by naming the macro ALIASED_8000_0001_F().  I don't see
> > how that's any less clear than __X86_FEATURE_8000_0001_ALIAS(), and as above,
> > there are several advantages to defining the alias in the context of the leaf
> > builder.
> > 
> 
> Hi!
> 
> I am stating my point again: Treating 8000_0001 leaf aliases as regular CPUID
> features means that we don't need common code to deal with this, and thus
> when someone reads the common code (and this is the thing I care about the
> most) that someone won't need to dig up the info about what these aliases
> are. 

Ah, this is where we disagree, I think.  I feel quite strongly that oddities such
as aliased/duplicate CPUID feature bits need to be made as visible as possible,
and well documented.  Hiding architectural quirks might save some readers a few
seconds of their time, but it can also confuse others, and more importantly, makes
it more difficult for new readers/developers to learn about the quirks.

This code _looks_ wrong, as there's no indication that CPUID_8000_0001_EDX is
unique.  I too wasn't aware of the aliases until this series, and I was very
confused by KVM's code.  The only clue that I was given was the "Don't duplicate
feature flags which are redundant with Intel!" comment in cpufeatures.h; I still
ended up digging through the APM to understand what was going on.

	kvm_cpu_cap_mask(CPUID_1_EDX,
		F(FPU) | F(VME) | F(DE) | F(PSE) |
		F(TSC) | F(MSR) | F(PAE) | F(MCE) |
		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SEP) |
		F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
		F(PAT) | F(PSE36) | 0 /* PSN */ | F(CLFLUSH) |
		0 /* Reserved, DS, ACPI */ | F(MMX) |
		F(FXSR) | F(XMM) | F(XMM2) | F(SELFSNOOP) |
		0 /* HTT, TM, Reserved, PBE */
	);

	kvm_cpu_cap_mask(CPUID_8000_0001_EDX,
		F(FPU) | F(VME) | F(DE) | F(PSE) |
		F(TSC) | F(MSR) | F(PAE) | F(MCE) |
		F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
		F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
		F(PAT) | F(PSE36) | 0 /* Reserved */ |
		F(NX) | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
		F(FXSR) | F(FXSR_OPT) | f_gbpages | F(RDTSCP) |
		0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW)
	);

Versus this code, which hopefully elicits a "huh!?" and prompts curious readers
to go look at the definition of ALIASED_1_EDX_F() to understand why KVM is being
weird.  And if readers can't figure things out purely from ALIASED_1_EDX_F()'s
comment, then that's effectively a KVM documentation issue and should be fixed.

In other words, I want to make things like this stick out so that more developers
are aware of such quirks, i.e. to to minimize the probability of such knowledge
being lost.  I don't want the next generation of KVM developers to have to
re-discover things that can be solved by a moderately verbose comment.

	kvm_cpu_cap_init(CPUID_1_EDX,
		F(FPU),
		F(VME),
		F(DE),
		F(PSE),
		F(TSC),
		F(MSR),
		F(PAE),
		F(MCE),
		F(CX8),
		F(APIC),
		...
	);

	kvm_cpu_cap_init(CPUID_8000_0001_EDX,
		ALIASED_1_EDX_F(FPU),
		ALIASED_1_EDX_F(VME),
		ALIASED_1_EDX_F(DE),
		ALIASED_1_EDX_F(PSE),
		ALIASED_1_EDX_F(TSC),
		ALIASED_1_EDX_F(MSR),
		ALIASED_1_EDX_F(PAE),
		ALIASED_1_EDX_F(MCE),
		ALIASED_1_EDX_F(CX8),
		ALIASED_1_EDX_F(APIC),
		...
	);

> I for example didn't knew about them because these aliases are basically a
> result of AMD redoing some things in the spec their way when they just
> released first 64-bit extensions.  I didn't follow the x86 ISA closely back
> then (I only had 32 bit systems to play with).
> 
> Best regards,
> 	Maxim Levitsky
> 
> 

^ permalink raw reply	[flat|nested] 185+ messages in thread

end of thread, other threads:[~2024-11-27 14:38 UTC | newest]

Thread overview: 185+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-17 17:38 [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Sean Christopherson
2024-05-17 17:38 ` [PATCH v2 01/49] KVM: x86: Do all post-set CPUID processing during vCPU creation Sean Christopherson
2024-07-05  0:48   ` Maxim Levitsky
2024-07-08 18:46     ` Sean Christopherson
2024-07-24 17:24       ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 02/49] KVM: x86: Explicitly do runtime CPUID updates "after" initial setup Sean Christopherson
2024-07-05  0:51   ` Maxim Levitsky
2024-07-09 19:46     ` Sean Christopherson
2024-07-24 17:24       ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 03/49] KVM: x86: Account for KVM-reserved CR4 bits when passing through CR4 on VMX Sean Christopherson
2024-07-05  0:55   ` Maxim Levitsky
2024-07-09 19:58     ` Sean Christopherson
2024-07-24 17:28       ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 04/49] KVM: selftests: Update x86's set_sregs_test to match KVM's CPUID enforcement Sean Christopherson
2024-07-05  0:55   ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 05/49] KVM: selftests: Assert that the @cpuid passed to get_cpuid_entry() is non-NULL Sean Christopherson
2024-07-05  0:58   ` Maxim Levitsky
2024-07-08 19:33     ` Sean Christopherson
2024-07-24 17:28       ` Maxim Levitsky
2024-11-21 18:57         ` Sean Christopherson
2024-05-17 17:38 ` [PATCH v2 06/49] KVM: selftests: Refresh vCPU CPUID cache in __vcpu_get_cpuid_entry() Sean Christopherson
2024-07-05  0:59   ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 07/49] KVM: selftests: Verify KVM stuffs runtime CPUID OS bits on CR4 writes Sean Christopherson
2024-07-05  1:02   ` Maxim Levitsky
2024-07-08 19:39     ` Sean Christopherson
2024-05-17 17:38 ` [PATCH v2 08/49] KVM: x86: Move __kvm_is_valid_cr4() definition to x86.h Sean Christopherson
2024-07-05  1:02   ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 09/49] KVM: x86/pmu: Drop now-redundant refresh() during init() Sean Christopherson
2024-07-05  1:02   ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 10/49] KVM: x86: Drop now-redundant MAXPHYADDR and GPA rsvd bits from vCPU creation Sean Christopherson
2024-07-05  1:13   ` Maxim Levitsky
2024-07-08 19:53     ` Sean Christopherson
2024-07-24 17:30       ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 11/49] KVM: x86: Disallow KVM_CAP_X86_DISABLE_EXITS after " Sean Christopherson
2024-07-05  1:17   ` Maxim Levitsky
2024-07-08 19:43     ` Sean Christopherson
2024-07-24 17:31       ` Maxim Levitsky
2024-07-25 18:07         ` Sean Christopherson
2024-07-12  7:42   ` Xiaoyao Li
2024-05-17 17:38 ` [PATCH v2 12/49] KVM: x86: Reject disabling of MWAIT/HLT interception when not allowed Sean Christopherson
2024-05-22  5:09   ` Binbin Wu
2024-05-28 18:56     ` Sean Christopherson
2024-07-05  1:17   ` Maxim Levitsky
2024-07-12  7:51   ` Xiaoyao Li
2024-07-12 13:31     ` Sean Christopherson
2024-05-17 17:38 ` [PATCH v2 13/49] KVM: selftests: Fix a bad TEST_REQUIRE() in x86's KVM PV test Sean Christopherson
2024-07-05  1:17   ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 14/49] KVM: selftests: Update x86's KVM PV test to match KVM's disabling exits behavior Sean Christopherson
2024-07-05  1:17   ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 15/49] KVM: x86: Zero out PV features cache when the CPUID leaf is not present Sean Christopherson
2024-07-05  1:17   ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 16/49] KVM: x86: Don't update PV features caches when enabling enforcement capability Sean Christopherson
2024-07-05  1:17   ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 17/49] KVM: x86: Do reverse CPUID sanity checks in __feature_leaf() Sean Christopherson
2024-07-05  1:17   ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 18/49] KVM: x86: Account for max supported CPUID leaf when getting raw host CPUID Sean Christopherson
2024-06-19  6:17   ` Yang, Weijiang
2024-06-19  8:07     ` Yang, Weijiang
2024-07-05  1:17   ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 19/49] KVM: x86: Add a macro to init CPUID features that ignore host kernel support Sean Christopherson
2024-07-05  1:21   ` Maxim Levitsky
2024-07-08 20:53     ` Sean Christopherson
2024-07-24 17:39       ` Maxim Levitsky
2024-07-08 22:36     ` Sean Christopherson
2024-07-24 17:40       ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 20/49] KVM: x86: Rename kvm_cpu_cap_mask() to kvm_cpu_cap_init() Sean Christopherson
2024-05-22  6:23   ` Binbin Wu
2024-05-28 18:54     ` Sean Christopherson
2024-07-05  1:24   ` Maxim Levitsky
2024-05-17 17:38 ` [PATCH v2 21/49] KVM: x86: Add a macro to init CPUID features that are 64-bit only Sean Christopherson
2024-07-05  1:24   ` Maxim Levitsky
2024-07-17 13:31   ` Xiaoyao Li
2024-05-17 17:38 ` [PATCH v2 22/49] KVM: x86: Add a macro to precisely handle aliased 0x1.EDX CPUID features Sean Christopherson
2024-07-05  1:25   ` Maxim Levitsky
2024-07-08 21:08     ` Sean Christopherson
2024-07-24 17:46       ` Maxim Levitsky
2024-07-25 18:39         ` Sean Christopherson
2024-08-05 11:06           ` mlevitsk
2024-08-05 22:00             ` Sean Christopherson
2024-09-10 20:37               ` Maxim Levitsky
2024-09-11 15:37                 ` Sean Christopherson
2024-11-22  3:17                   ` Maxim Levitsky
2024-11-27 14:38                     ` Sean Christopherson
2024-05-17 17:39 ` [PATCH v2 23/49] KVM: x86: Handle kernel- and KVM-defined CPUID words in a single helper Sean Christopherson
2024-07-05  1:28   ` Maxim Levitsky
2024-07-08 21:18     ` Sean Christopherson
2024-07-17 14:00       ` Xiaoyao Li
2024-07-24 17:51       ` Maxim Levitsky
2024-07-25 19:18         ` Sean Christopherson
2024-08-05 11:07           ` mlevitsk
2024-05-17 17:39 ` [PATCH v2 24/49] KVM: x86: #undef SPEC_CTRL_SSBD in cpuid.c to avoid macro collisions Sean Christopherson
2024-07-05  1:30   ` Maxim Levitsky
2024-07-08 21:29     ` Sean Christopherson
2024-07-24 17:54       ` Maxim Levitsky
2024-07-26 23:34         ` Sean Christopherson
2024-08-05 11:11           ` mlevitsk
2024-08-05 21:35             ` Sean Christopherson
2024-09-10 20:37               ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 25/49] KVM: x86: Harden CPU capabilities processing against out-of-scope features Sean Christopherson
2024-07-05  1:31   ` Maxim Levitsky
2024-07-09 18:11     ` Sean Christopherson
2024-07-24 17:55       ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 26/49] KVM: x86: Add a macro to init CPUID features that KVM emulates in software Sean Christopherson
2024-07-05  1:59   ` Maxim Levitsky
2024-07-08 22:30     ` Sean Christopherson
2024-07-24 17:58       ` Maxim Levitsky
2024-07-27  0:06         ` Sean Christopherson
2024-08-05 11:16           ` mlevitsk
2024-08-05 19:59             ` Sean Christopherson
2024-09-10 20:41               ` Maxim Levitsky
2024-09-11 16:03                 ` Sean Christopherson
2024-11-22  3:28                   ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 27/49] KVM: x86: Swap incoming guest CPUID into vCPU before massaging in KVM_SET_CPUID2 Sean Christopherson
2024-07-05  1:32   ` Maxim Levitsky
2024-07-08 21:37     ` Sean Christopherson
2024-05-17 17:39 ` [PATCH v2 28/49] KVM: x86: Clear PV_UNHALT for !HLT-exiting only when userspace sets CPUID Sean Christopherson
2024-07-05  1:32   ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 29/49] KVM: x86: Remove unnecessary caching of KVM's PV CPUID base Sean Christopherson
2024-07-05  1:51   ` Maxim Levitsky
2024-07-09 19:00     ` Sean Christopherson
2024-07-24 17:59       ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 30/49] KVM: x86: Always operate on kvm_vcpu data in cpuid_entry2_find() Sean Christopherson
2024-07-05  1:51   ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 31/49] KVM: x86: Move kvm_find_cpuid_entry{,_index}() up near cpuid_entry2_find() Sean Christopherson
2024-07-05  1:51   ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 32/49] KVM: x86: Remove all direct usage of cpuid_entry2_find() Sean Christopherson
2024-07-05  1:52   ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 33/49] KVM: x86: Advertise TSC_DEADLINE_TIMER in KVM_GET_SUPPORTED_CPUID Sean Christopherson
2024-05-22  9:11   ` Binbin Wu
2024-05-28 15:21     ` Sean Christopherson
2024-07-05  2:04   ` Maxim Levitsky
2024-07-09 19:28     ` Sean Christopherson
2024-07-24 18:00       ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 34/49] KVM: x86: Advertise HYPERVISOR " Sean Christopherson
2024-07-05  2:04   ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 35/49] KVM: x86: Add a macro to handle features that are fully VMM controlled Sean Christopherson
2024-07-05  2:08   ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 36/49] KVM: x86: Rename "governed features" helpers to use "guest_cpu_cap" Sean Christopherson
2024-05-22 14:23   ` Binbin Wu
2024-05-17 17:39 ` [PATCH v2 37/49] KVM: x86: Replace guts of "governed" features with comprehensive cpu_caps Sean Christopherson
2024-06-20  2:20   ` Yang, Weijiang
2024-07-05  2:10   ` Maxim Levitsky
2024-07-09 18:30     ` Sean Christopherson
2024-07-24 18:00       ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 38/49] KVM: x86: Initialize guest cpu_caps based on guest CPUID Sean Christopherson
2024-06-20  2:24   ` Yang, Weijiang
2024-07-05  2:13   ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 39/49] KVM: x86: Extract code for generating per-entry emulated CPUID information Sean Christopherson
2024-07-05  2:18   ` Maxim Levitsky
2024-07-09  0:13     ` Sean Christopherson
2024-07-24 18:00       ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 40/49] KVM: x86: Initialize guest cpu_caps based on KVM support Sean Christopherson
2024-07-05  2:22   ` Maxim Levitsky
2024-07-09  0:10     ` Sean Christopherson
2024-07-24 18:01       ` Maxim Levitsky
2024-07-29 15:34         ` Sean Christopherson
2024-08-05 11:16           ` mlevitsk
2024-05-17 17:39 ` [PATCH v2 41/49] KVM: x86: Avoid double CPUID lookup when updating MWAIT at runtime Sean Christopherson
2024-07-05  2:22   ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 42/49] KVM: x86: Drop unnecessary check that cpuid_entry2_find() returns right leaf Sean Christopherson
2024-07-05  2:22   ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 43/49] KVM: x86: Update OS{XSAVE,PKE} bits in guest CPUID irrespective of host support Sean Christopherson
2024-07-05  2:22   ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 44/49] KVM: x86: Update guest cpu_caps at runtime for dynamic CPUID-based features Sean Christopherson
2024-07-05  2:26   ` Maxim Levitsky
2024-07-09  0:24     ` Sean Christopherson
2024-09-10 20:41       ` Maxim Levitsky
2024-09-11 15:41         ` Sean Christopherson
2024-11-22  2:11           ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 45/49] KVM: x86: Shuffle code to prepare for dropping guest_cpuid_has() Sean Christopherson
2024-07-05  2:26   ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 46/49] KVM: x86: Replace (almost) all guest CPUID feature queries with cpu_caps Sean Christopherson
2024-07-05  2:34   ` Maxim Levitsky
2024-07-09 19:20     ` Sean Christopherson
2024-07-24 18:01       ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 47/49] KVM: x86: Drop superfluous host XSAVE check when adjusting guest XSAVES caps Sean Christopherson
2024-07-05  2:36   ` Maxim Levitsky
2024-07-09 19:15     ` Sean Christopherson
2024-07-24 18:02       ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 48/49] KVM: x86: Add a macro for features that are synthesized into boot_cpu_data Sean Christopherson
2024-07-05  2:43   ` Maxim Levitsky
2024-07-09 21:13     ` Sean Christopherson
2024-07-24 18:04       ` Maxim Levitsky
2024-05-17 17:39 ` [PATCH v2 49/49] *** DO NOT APPLY *** KVM: x86: Verify KVM initializes all consumed guest caps Sean Christopherson
2024-05-17 17:54 ` [PATCH v2 00/49] KVM: x86: CPUID overhaul, fixes, and caching Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).