public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] KVM: x86: Introduce quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT
@ 2025-03-01  7:34 Paolo Bonzini
  2025-03-01  7:34 ` [PATCH 1/4] KVM: x86: Allow vendor code to disable quirks Paolo Bonzini
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Paolo Bonzini @ 2025-03-01  7:34 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, yan.y.zhao

This series is my evolution of Yan's patches at
https://patchew.org/linux/20250224070716.31360-1-yan.y.zhao@intel.com/.

The implementation of the quirk is unchanged, but the concepts in kvm_caps
are a bit different.  In particular:

- if a quirk is not applicable to some hardware, it is still included
  in KVM_CAP_DISABLE_QUIRKS2.  This way userspace knows that KVM is
  *aware* of a particular issue - even if disabling it has no effect
  because the quirk is not a problem on a specific hardware, userspace
  may want to know that it can rely on the problematic behavior not
  being present.  Therefore, KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is
  simply auto-disabled on TDX machines.

- if instead a quirk cannot be disabled due to limitations, for example
  KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT if self-snoop is not present on
  the CPU, the quirk is removed completely from kvm_caps.supported_quirks
  and therefore from KVM_CAP_DISABLE_QUIRKS2.

This series does not introduce a way to query always-disabled quirks,
which could be for example KVM_CAP_DISABLED_QUIRKS.  This could be
added if we wanted for example to get rid of hypercall patching; it's
a trivial addition.

Paolo Bonzini (1):
  KVM: x86: Allow vendor code to disable quirks

Yan Zhao (3):
  KVM: x86: Introduce supported_quirks to block disabling quirks
  KVM: x86: Introduce Intel specific quirk
    KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT
  KVM: TDX: Always honor guest PAT on TDX enabled platforms

 Documentation/virt/kvm/api.rst  | 22 ++++++++++++++++++
 arch/x86/include/uapi/asm/kvm.h |  1 +
 arch/x86/kvm/mmu.h              |  2 +-
 arch/x86/kvm/mmu/mmu.c          | 11 +++++----
 arch/x86/kvm/svm/svm.c          |  1 +
 arch/x86/kvm/vmx/tdx.c          |  6 +++++
 arch/x86/kvm/vmx/vmx.c          | 40 +++++++++++++++++++++++++++------
 arch/x86/kvm/x86.c              | 10 +++++----
 arch/x86/kvm/x86.h              | 14 +++++++-----
 9 files changed, 86 insertions(+), 21 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] KVM: x86: Allow vendor code to disable quirks
  2025-03-01  7:34 [PATCH v2 0/4] KVM: x86: Introduce quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Paolo Bonzini
@ 2025-03-01  7:34 ` Paolo Bonzini
  2025-03-02 17:11   ` Xiaoyao Li
  2025-03-03  1:15   ` Yan Zhao
  2025-03-01  7:34 ` [PATCH 2/4] KVM: x86: Introduce supported_quirks to block disabling quirks Paolo Bonzini
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 20+ messages in thread
From: Paolo Bonzini @ 2025-03-01  7:34 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, yan.y.zhao

In some cases, the handling of quirks is split between platform-specific
code and generic code, or it is done entirely in generic code, but the
relevant bug does not trigger on some platforms; for example,
KVM_X86_QUIRK_CD_NW_CLEARED is only applicable to AMD systems.  In that
case, allow unaffected vendor modules to disable handling of the quirk.

The quirk remains available in KVM_CAP_DISABLE_QUIRKS2, because that API
tells userspace that KVM *knows* that some of its past behavior was bogus
or just undesirable.  In other words, it's plausible for userspace to
refuse to run if a quirk is not listed by KVM_CAP_DISABLE_QUIRKS2.

In kvm_check_has_quirk(), in addition to checking if a quirk is not
explicitly disabled by the user, also verify if the quirk applies to
the hardware.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Message-ID: <20250224070832.31394-1-yan.y.zhao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/vmx.c |  1 +
 arch/x86/kvm/x86.c     |  1 +
 arch/x86/kvm/x86.h     | 12 +++++++-----
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 486fbdb4365c..75df4caea2f7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8506,6 +8506,7 @@ __init int vmx_hardware_setup(void)
 
 	kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
 
+	kvm_caps.inapplicable_quirks = KVM_X86_QUIRK_CD_NW_CLEARED;
 	return r;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 856ceeb4fb35..fd0a44e59314 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9775,6 +9775,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 		kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
 		kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
 	}
+	kvm_caps.inapplicable_quirks = 0;
 
 	rdmsrl_safe(MSR_EFER, &kvm_host.efer);
 
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 8ce6da98b5a2..9af199c8e5c8 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -34,6 +34,7 @@ struct kvm_caps {
 	u64 supported_xcr0;
 	u64 supported_xss;
 	u64 supported_perf_cap;
+	u64 inapplicable_quirks;
 };
 
 struct kvm_host_values {
@@ -354,11 +355,6 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu,
 	return kvm_register_write_raw(vcpu, reg, val);
 }
 
-static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
-{
-	return !(kvm->arch.disabled_quirks & quirk);
-}
-
 void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
 
 u64 get_kvmclock_ns(struct kvm *kvm);
@@ -394,6 +390,12 @@ extern struct kvm_host_values kvm_host;
 
 extern bool enable_pmu;
 
+static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
+{
+	u64 disabled_quirks = kvm_caps.inapplicable_quirks | kvm->arch.disabled_quirks;
+	return !(disabled_quirks & quirk);
+}
+
 /*
  * Get a filtered version of KVM's supported XCR0 that strips out dynamic
  * features for which the current process doesn't (yet) have permission to use.
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/4] KVM: x86: Introduce supported_quirks to block disabling quirks
  2025-03-01  7:34 [PATCH v2 0/4] KVM: x86: Introduce quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Paolo Bonzini
  2025-03-01  7:34 ` [PATCH 1/4] KVM: x86: Allow vendor code to disable quirks Paolo Bonzini
@ 2025-03-01  7:34 ` Paolo Bonzini
  2025-03-02 17:13   ` Xiaoyao Li
  2025-03-03  1:23   ` Yan Zhao
  2025-03-01  7:34 ` [PATCH 3/4] KVM: x86: Introduce Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Paolo Bonzini
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 20+ messages in thread
From: Paolo Bonzini @ 2025-03-01  7:34 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, yan.y.zhao

From: Yan Zhao <yan.y.zhao@intel.com>

Introduce supported_quirks in kvm_caps to store platform-specific force-enabled
quirks.  Any quirk removed from kvm_caps.supported_quirks will never be
included in kvm->arch.disabled_quirks, and will cause the ioctl to fail if
passed to KVM_ENABLE_CAP(KVM_CAP_DISABLE_QUIRKS2).

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Message-ID: <20250224070832.31394-1-yan.y.zhao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/x86.c | 7 ++++---
 arch/x86/kvm/x86.h | 2 ++
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fd0a44e59314..a97e58916b6a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4782,7 +4782,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = enable_pmu ? KVM_CAP_PMU_VALID_MASK : 0;
 		break;
 	case KVM_CAP_DISABLE_QUIRKS2:
-		r = KVM_X86_VALID_QUIRKS;
+		r = kvm_caps.supported_quirks;
 		break;
 	case KVM_CAP_X86_NOTIFY_VMEXIT:
 		r = kvm_caps.has_notify_vmexit;
@@ -6521,11 +6521,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 	switch (cap->cap) {
 	case KVM_CAP_DISABLE_QUIRKS2:
 		r = -EINVAL;
-		if (cap->args[0] & ~KVM_X86_VALID_QUIRKS)
+		if (cap->args[0] & ~kvm_caps.supported_quirks)
 			break;
 		fallthrough;
 	case KVM_CAP_DISABLE_QUIRKS:
-		kvm->arch.disabled_quirks = cap->args[0];
+		kvm->arch.disabled_quirks = cap->args[0] & kvm_caps.supported_quirks;
 		r = 0;
 		break;
 	case KVM_CAP_SPLIT_IRQCHIP: {
@@ -9775,6 +9775,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 		kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
 		kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
 	}
+	kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS;
 	kvm_caps.inapplicable_quirks = 0;
 
 	rdmsrl_safe(MSR_EFER, &kvm_host.efer);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 9af199c8e5c8..f2672b14388c 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -34,6 +34,8 @@ struct kvm_caps {
 	u64 supported_xcr0;
 	u64 supported_xss;
 	u64 supported_perf_cap;
+
+	u64 supported_quirks;
 	u64 inapplicable_quirks;
 };
 
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/4] KVM: x86: Introduce Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT
  2025-03-01  7:34 [PATCH v2 0/4] KVM: x86: Introduce quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Paolo Bonzini
  2025-03-01  7:34 ` [PATCH 1/4] KVM: x86: Allow vendor code to disable quirks Paolo Bonzini
  2025-03-01  7:34 ` [PATCH 2/4] KVM: x86: Introduce supported_quirks to block disabling quirks Paolo Bonzini
@ 2025-03-01  7:34 ` Paolo Bonzini
  2025-03-02 17:30   ` Xiaoyao Li
  2025-03-03  1:25   ` Yan Zhao
  2025-03-01  7:34 ` [PATCH 4/4] KVM: TDX: Always honor guest PAT on TDX enabled platforms Paolo Bonzini
  2025-03-03  1:47 ` [PATCH v2 0/4] KVM: x86: Introduce quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Yan Zhao
  4 siblings, 2 replies; 20+ messages in thread
From: Paolo Bonzini @ 2025-03-01  7:34 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, yan.y.zhao, Kevin Tian

From: Yan Zhao <yan.y.zhao@intel.com>

Introduce an Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT to have
KVM ignore guest PAT when this quirk is enabled.

KVM is able to safely honor guest PAT on Intel platforms when CPU feature
self-snoop is supported. However, KVM honoring guest PAT was reverted after
commit 9d70f3fec144 ("Revert "KVM: VMX: Always honor guest PAT on CPUs that
support self-snoop""), due to UC access on certain Intel platforms being
very slow [1]. Honoring guest PAT on those platforms may break some old
guests that accidentally specify PAT as UC. Those old guests may never
expect the slowness since KVM always forces WB previously. See [2].

So, introduce an Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT.
KVM enables the quirk on all Intel platforms by default to avoid breaking
old unmodifiable guests. Newer userspace can disable this quirk to turn on
honoring guest PAT.

The quirk is only valid on Intel's platforms and is absent on AMD's
platforms as KVM always honors guest PAT when running on AMD.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/all/Ztl9NWCOupNfVaCA@yzhao56-desk.sh.intel.com # [1]
Link: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com # [2]
Message-ID: <20250224070946.31482-1-yan.y.zhao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/virt/kvm/api.rst  | 22 +++++++++++++++++++
 arch/x86/include/uapi/asm/kvm.h |  1 +
 arch/x86/kvm/mmu.h              |  2 +-
 arch/x86/kvm/mmu/mmu.c          | 11 ++++++----
 arch/x86/kvm/svm/svm.c          |  1 +
 arch/x86/kvm/vmx/vmx.c          | 39 +++++++++++++++++++++++++++------
 arch/x86/kvm/x86.c              |  2 +-
 7 files changed, 65 insertions(+), 13 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 2d75edc9db4f..1f13e47a65fa 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8157,6 +8157,28 @@ KVM_X86_QUIRK_STUFF_FEATURE_MSRS    By default, at vCPU creation, KVM sets the
                                     and 0x489), as KVM does now allow them to
                                     be set by userspace (KVM sets them based on
                                     guest CPUID, for safety purposes).
+
+KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT  By default, on Intel platforms, KVM ignores
+                                    guest PAT and forces the effective memory
+                                    type to WB in EPT.  The quirk is not available
+                                    on Intel platforms which are incapable of
+                                    safely honoring guest PAT (i.e., without CPU
+                                    self-snoop, KVM always ignores guest PAT and
+                                    forces effective memory type to WB).  It is
+                                    also ignored on AMD platforms or, on Intel,
+                                    when a VM has non-coherent DMA devices
+                                    assigned; KVM always honors guest PAT in
+                                    such case. The quirk is needed to avoid
+                                    slowdowns on certain Intel Xeon platforms
+                                    (e.g. ICX, SPR) where self-snoop feature is
+                                    supported but UC is slow enough to cause
+                                    issues with some older guests that use
+                                    UC instead of WC to map the video RAM.
+                                    Userspace can disable the quirk to honor
+                                    guest PAT if it knows that there is no such
+                                    guest software, for example if it does not
+                                    expose a bochs graphics device (which is
+                                    known to have had a buggy driver).
 =================================== ============================================
 
 7.32 KVM_CAP_MAX_VCPU_ID
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 89cc7a18ef45..db55a70e173c 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -441,6 +441,7 @@ struct kvm_sync_regs {
 #define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS	(1 << 6)
 #define KVM_X86_QUIRK_SLOT_ZAP_ALL		(1 << 7)
 #define KVM_X86_QUIRK_STUFF_FEATURE_MSRS	(1 << 8)
+#define KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT	(1 << 9)
 
 #define KVM_STATE_NESTED_FORMAT_VMX	0
 #define KVM_STATE_NESTED_FORMAT_SVM	1
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 47e64a3c4ce3..f999c15d8d3e 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -232,7 +232,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 	return -(u32)fault & errcode;
 }
 
-bool kvm_mmu_may_ignore_guest_pat(void);
+bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm);
 
 int kvm_mmu_post_init_vm(struct kvm *kvm);
 void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e6eb3a262f8d..bcf395d7ec53 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4663,17 +4663,20 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
 }
 #endif
 
-bool kvm_mmu_may_ignore_guest_pat(void)
+bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm)
 {
 	/*
 	 * When EPT is enabled (shadow_memtype_mask is non-zero), and the VM
 	 * has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to
 	 * honor the memtype from the guest's PAT so that guest accesses to
 	 * memory that is DMA'd aren't cached against the guest's wishes.  As a
-	 * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA,
-	 * KVM _always_ ignores guest PAT (when EPT is enabled).
+	 * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA.
+	 * KVM _always_ ignores guest PAT, when EPT is enabled and when quirk
+	 * KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is enabled or the CPU lacks the
+	 * ability to safely honor guest PAT.
 	 */
-	return shadow_memtype_mask;
+	return shadow_memtype_mask &&
+	       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT);
 }
 
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ebaa5a41db07..2254dbebddac 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5426,6 +5426,7 @@ static __init int svm_hardware_setup(void)
 	 */
 	allow_smaller_maxphyaddr = !npt_enabled;
 
+	kvm_caps.inapplicable_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;
 	return 0;
 
 err:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 75df4caea2f7..5365efb22e96 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7599,6 +7599,33 @@ int vmx_vm_init(struct kvm *kvm)
 	return 0;
 }
 
+/*
+ * Ignore guest PAT when the CPU doesn't support self-snoop to safely honor
+ * guest PAT, or quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is turned on.  Always
+ * honor guest PAT when there's non-coherent DMA device attached.
+ *
+ * Honoring guest PAT means letting the guest control memory types.
+ * - On Intel CPUs that lack self-snoop feature, honoring guest PAT may result
+ *   in unexpected behavior. So always ignore guest PAT on those CPUs.
+ *
+ * - KVM's ABI is to trust the guest for attached non-coherent DMA devices to
+ *   function correctly (non-coherent DMA devices need the guest to flush CPU
+ *   caches properly). So honoring guest PAT to avoid breaking existing ABI.
+ *
+ * - On certain Intel CPUs (e.g. SPR, ICX), though self-snoop feature is
+ *   supported, UC is slow enough to cause issues with some older guests (e.g.
+ *   an old version of bochs driver uses ioremap() instead of ioremap_wc() to
+ *   map the video RAM, causing wayland desktop to fail to get started
+ *   correctly). To avoid breaking those old guests that rely on KVM to force
+ *   memory type to WB, only honoring guest PAT when quirk
+ *   KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is disabled.
+ */
+static inline bool vmx_ignore_guest_pat(struct kvm *kvm)
+{
+	return !kvm_arch_has_noncoherent_dma(kvm) &&
+	       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT);
+}
+
 u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 {
 	/*
@@ -7608,13 +7635,8 @@ u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
 	if (is_mmio)
 		return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
 
-	/*
-	 * Force WB and ignore guest PAT if the VM does NOT have a non-coherent
-	 * device attached.  Letting the guest control memory types on Intel
-	 * CPUs may result in unexpected behavior, and so KVM's ABI is to trust
-	 * the guest to behave only as a last resort.
-	 */
-	if (!kvm_arch_has_noncoherent_dma(vcpu->kvm))
+	/* Force WB if ignoring guest PAT */
+	if (vmx_ignore_guest_pat(vcpu->kvm))
 		return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
 
 	return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT);
@@ -8506,6 +8528,9 @@ __init int vmx_hardware_setup(void)
 
 	kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
 
+	/* Must use WB if the CPU does not have self-snoop.  */
+	if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
+		kvm_caps.supported_quirks &= ~KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;
 	kvm_caps.inapplicable_quirks = KVM_X86_QUIRK_CD_NW_CLEARED;
 	return r;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a97e58916b6a..b221f273ec77 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13544,7 +13544,7 @@ static void kvm_noncoherent_dma_assignment_start_or_stop(struct kvm *kvm)
 	 * (or last) non-coherent device is (un)registered to so that new SPTEs
 	 * with the correct "ignore guest PAT" setting are created.
 	 */
-	if (kvm_mmu_may_ignore_guest_pat())
+	if (kvm_mmu_may_ignore_guest_pat(kvm))
 		kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
 }
 
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/4] KVM: TDX: Always honor guest PAT on TDX enabled platforms
  2025-03-01  7:34 [PATCH v2 0/4] KVM: x86: Introduce quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Paolo Bonzini
                   ` (2 preceding siblings ...)
  2025-03-01  7:34 ` [PATCH 3/4] KVM: x86: Introduce Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Paolo Bonzini
@ 2025-03-01  7:34 ` Paolo Bonzini
  2025-03-02 17:03   ` Xiaoyao Li
  2025-03-03  1:30   ` Yan Zhao
  2025-03-03  1:47 ` [PATCH v2 0/4] KVM: x86: Introduce quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Yan Zhao
  4 siblings, 2 replies; 20+ messages in thread
From: Paolo Bonzini @ 2025-03-01  7:34 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, yan.y.zhao

From: Yan Zhao <yan.y.zhao@intel.com>

Always honor guest PAT in KVM-managed EPTs on TDX enabled platforms by
making self-snoop feature a hard dependency for TDX and making quirk
KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT not a valid quirk once TDX is enabled.

The quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT only affects memory type of
KVM-managed EPTs. For the TDX-module-managed private EPT, memory type is
always forced to WB now.

Honoring guest PAT in KVM-managed EPTs ensures KVM does not invoke
kvm_zap_gfn_range() when attaching/detaching non-coherent DMA devices;
this would cause mirrored EPTs for TDs to be zapped, as well as incorrect
zapping of the private EPT that is managed by the TDX module.

As a new platform, TDX always comes with self-snoop feature supported and has
no worry to break old not-well-written yet unmodifiable guests. So, simply
force-disable the KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT quirk for TDX VMs.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Message-ID: <20250224071039.31511-1-yan.y.zhao@intel.com>
[Use disabled_quirks instead of supported_quirks. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/vmx/tdx.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b6f6f6e2f02e..4450fd99cb4c 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -624,6 +624,7 @@ int tdx_vm_init(struct kvm *kvm)
 
 	kvm->arch.has_protected_state = true;
 	kvm->arch.has_private_mem = true;
+	kvm->arch.disabled_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;
 
 	/*
 	 * Because guest TD is protected, VMM can't parse the instruction in TD.
@@ -3470,6 +3471,11 @@ int __init tdx_bringup(void)
 		goto success_disable_tdx;
 	}
 
+	if (!cpu_feature_enabled(X86_FEATURE_SELFSNOOP)) {
+		pr_err("Self-snoop is required for TDX\n");
+		goto success_disable_tdx;
+	}
+
 	if (!cpu_feature_enabled(X86_FEATURE_TDX_HOST_PLATFORM)) {
 		pr_err("tdx: no TDX private KeyIDs available\n");
 		goto success_disable_tdx;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] KVM: TDX: Always honor guest PAT on TDX enabled platforms
  2025-03-01  7:34 ` [PATCH 4/4] KVM: TDX: Always honor guest PAT on TDX enabled platforms Paolo Bonzini
@ 2025-03-02 17:03   ` Xiaoyao Li
  2025-03-03  1:30   ` Yan Zhao
  1 sibling, 0 replies; 20+ messages in thread
From: Xiaoyao Li @ 2025-03-02 17:03 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc, yan.y.zhao

On 3/1/2025 3:34 PM, Paolo Bonzini wrote:
> From: Yan Zhao <yan.y.zhao@intel.com>
> 
> Always honor guest PAT in KVM-managed EPTs on TDX enabled platforms by
> making self-snoop feature a hard dependency for TDX and making quirk
> KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT not a valid quirk once TDX is enabled.
> 
> The quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT only affects memory type of
> KVM-managed EPTs. For the TDX-module-managed private EPT, memory type is
> always forced to WB now.
> 
> Honoring guest PAT in KVM-managed EPTs ensures KVM does not invoke
> kvm_zap_gfn_range() when attaching/detaching non-coherent DMA devices;
> this would cause mirrored EPTs for TDs to be zapped, as well as incorrect
> zapping of the private EPT that is managed by the TDX module.
> 
> As a new platform, TDX always comes with self-snoop feature supported and has
> no worry to break old not-well-written yet unmodifiable guests. So, simply
> force-disable the KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT quirk for TDX VMs.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Message-ID: <20250224071039.31511-1-yan.y.zhao@intel.com>
> [Use disabled_quirks instead of supported_quirks. - Paolo]
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/kvm/vmx/tdx.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index b6f6f6e2f02e..4450fd99cb4c 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -624,6 +624,7 @@ int tdx_vm_init(struct kvm *kvm)
>   
>   	kvm->arch.has_protected_state = true;
>   	kvm->arch.has_private_mem = true;
> +	kvm->arch.disabled_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;

This doesn't present userspace from dropping the 
KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT bit by updating
kvm->arch.disabled_quirk via KVM_CAP_DISABLE_QUIRKS.

I think we can make inapplicable_quirks per VM in Patch 1 and set

     kvm->arch.inapplicable_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;

for TDX VMs.

>   
>   	/*
>   	 * Because guest TD is protected, VMM can't parse the instruction in TD.
> @@ -3470,6 +3471,11 @@ int __init tdx_bringup(void)
>   		goto success_disable_tdx;
>   	}
>   
> +	if (!cpu_feature_enabled(X86_FEATURE_SELFSNOOP)) {
> +		pr_err("Self-snoop is required for TDX\n");
> +		goto success_disable_tdx;
> +	}
> +
>   	if (!cpu_feature_enabled(X86_FEATURE_TDX_HOST_PLATFORM)) {
>   		pr_err("tdx: no TDX private KeyIDs available\n");
>   		goto success_disable_tdx;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: x86: Allow vendor code to disable quirks
  2025-03-01  7:34 ` [PATCH 1/4] KVM: x86: Allow vendor code to disable quirks Paolo Bonzini
@ 2025-03-02 17:11   ` Xiaoyao Li
  2025-03-03  1:15   ` Yan Zhao
  1 sibling, 0 replies; 20+ messages in thread
From: Xiaoyao Li @ 2025-03-02 17:11 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc, yan.y.zhao

On 3/1/2025 3:34 PM, Paolo Bonzini wrote:
> In some cases, the handling of quirks is split between platform-specific
> code and generic code, or it is done entirely in generic code, but the
> relevant bug does not trigger on some platforms; for example,
> KVM_X86_QUIRK_CD_NW_CLEARED is only applicable to AMD systems.  In that
> case, allow unaffected vendor modules to disable handling of the quirk.
> 
> The quirk remains available in KVM_CAP_DISABLE_QUIRKS2, because that API
> tells userspace that KVM *knows* that some of its past behavior was bogus
> or just undesirable.  In other words, it's plausible for userspace to
> refuse to run if a quirk is not listed by KVM_CAP_DISABLE_QUIRKS2.

I think it's just for existing quirks for backwards compatibilities 
reason. For new quirk bit that is vendor specific, 
KVM_CAP_DISABLE_QUIRKS2 is OK to enumerate different value.

> In kvm_check_has_quirk(), in addition to checking if a quirk is not
> explicitly disabled by the user, also verify if the quirk applies to
> the hardware.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>

This is inconsistent with the Author.

> Message-ID: <20250224070832.31394-1-yan.y.zhao@intel.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/kvm/vmx/vmx.c |  1 +
>   arch/x86/kvm/x86.c     |  1 +
>   arch/x86/kvm/x86.h     | 12 +++++++-----
>   3 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 486fbdb4365c..75df4caea2f7 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8506,6 +8506,7 @@ __init int vmx_hardware_setup(void)
>   
>   	kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
>   
> +	kvm_caps.inapplicable_quirks = KVM_X86_QUIRK_CD_NW_CLEARED;

Suggest to make inapplicable_quirks per VM, as I comments in patch 4:

https://lore.kernel.org/all/338901b6-4d10-480d-bd0a-0db8ec4afad5@intel.com/https://lore.kernel.org/all/338901b6-4d10-480d-bd0a-0db8ec4afad5@intel.com/

>   	return r;
>   }
>   
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 856ceeb4fb35..fd0a44e59314 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9775,6 +9775,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>   		kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
>   		kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
>   	}
> +	kvm_caps.inapplicable_quirks = 0;
>   
>   	rdmsrl_safe(MSR_EFER, &kvm_host.efer);
>   
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 8ce6da98b5a2..9af199c8e5c8 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -34,6 +34,7 @@ struct kvm_caps {
>   	u64 supported_xcr0;
>   	u64 supported_xss;
>   	u64 supported_perf_cap;
> +	u64 inapplicable_quirks;
>   };
>   
>   struct kvm_host_values {
> @@ -354,11 +355,6 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu,
>   	return kvm_register_write_raw(vcpu, reg, val);
>   }
>   
> -static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
> -{
> -	return !(kvm->arch.disabled_quirks & quirk);
> -}
> -
>   void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
>   
>   u64 get_kvmclock_ns(struct kvm *kvm);
> @@ -394,6 +390,12 @@ extern struct kvm_host_values kvm_host;
>   
>   extern bool enable_pmu;
>   
> +static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
> +{
> +	u64 disabled_quirks = kvm_caps.inapplicable_quirks | kvm->arch.disabled_quirks;
> +	return !(disabled_quirks & quirk);
> +}
> +
>   /*
>    * Get a filtered version of KVM's supported XCR0 that strips out dynamic
>    * features for which the current process doesn't (yet) have permission to use.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce supported_quirks to block disabling quirks
  2025-03-01  7:34 ` [PATCH 2/4] KVM: x86: Introduce supported_quirks to block disabling quirks Paolo Bonzini
@ 2025-03-02 17:13   ` Xiaoyao Li
  2025-03-03  1:23   ` Yan Zhao
  1 sibling, 0 replies; 20+ messages in thread
From: Xiaoyao Li @ 2025-03-02 17:13 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc, yan.y.zhao

On 3/1/2025 3:34 PM, Paolo Bonzini wrote:
> From: Yan Zhao <yan.y.zhao@intel.com>
> 
> Introduce supported_quirks in kvm_caps to store platform-specific force-enabled
> quirks.  Any quirk removed from kvm_caps.supported_quirks will never be
> included in kvm->arch.disabled_quirks, and will cause the ioctl to fail if
> passed to KVM_ENABLE_CAP(KVM_CAP_DISABLE_QUIRKS2).
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Message-ID: <20250224070832.31394-1-yan.y.zhao@intel.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/kvm/x86.c | 7 ++++---
>   arch/x86/kvm/x86.h | 2 ++
>   2 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fd0a44e59314..a97e58916b6a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4782,7 +4782,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>   		r = enable_pmu ? KVM_CAP_PMU_VALID_MASK : 0;
>   		break;
>   	case KVM_CAP_DISABLE_QUIRKS2:
> -		r = KVM_X86_VALID_QUIRKS;
> +		r = kvm_caps.supported_quirks;
>   		break;
>   	case KVM_CAP_X86_NOTIFY_VMEXIT:
>   		r = kvm_caps.has_notify_vmexit;
> @@ -6521,11 +6521,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   	switch (cap->cap) {
>   	case KVM_CAP_DISABLE_QUIRKS2:
>   		r = -EINVAL;
> -		if (cap->args[0] & ~KVM_X86_VALID_QUIRKS)
> +		if (cap->args[0] & ~kvm_caps.supported_quirks)
>   			break;
>   		fallthrough;
>   	case KVM_CAP_DISABLE_QUIRKS:
> -		kvm->arch.disabled_quirks = cap->args[0];
> +		kvm->arch.disabled_quirks = cap->args[0] & kvm_caps.supported_quirks;

Don't need this. It's redundant with the above, which ensures
cap->args[0] is the subset of kvm_caps.supported_quirks

Otherwise,

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

>   		r = 0;
>   		break;
>   	case KVM_CAP_SPLIT_IRQCHIP: {
> @@ -9775,6 +9775,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>   		kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
>   		kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
>   	}
> +	kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS;
>   	kvm_caps.inapplicable_quirks = 0;
>   
>   	rdmsrl_safe(MSR_EFER, &kvm_host.efer);
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 9af199c8e5c8..f2672b14388c 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -34,6 +34,8 @@ struct kvm_caps {
>   	u64 supported_xcr0;
>   	u64 supported_xss;
>   	u64 supported_perf_cap;
> +
> +	u64 supported_quirks;
>   	u64 inapplicable_quirks;
>   };
>   


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] KVM: x86: Introduce Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT
  2025-03-01  7:34 ` [PATCH 3/4] KVM: x86: Introduce Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Paolo Bonzini
@ 2025-03-02 17:30   ` Xiaoyao Li
  2025-03-03  1:25   ` Yan Zhao
  1 sibling, 0 replies; 20+ messages in thread
From: Xiaoyao Li @ 2025-03-02 17:30 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm; +Cc: seanjc, yan.y.zhao, Kevin Tian

On 3/1/2025 3:34 PM, Paolo Bonzini wrote:
> From: Yan Zhao <yan.y.zhao@intel.com>
> 
> Introduce an Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT to have
> KVM ignore guest PAT when this quirk is enabled.
> 
> KVM is able to safely honor guest PAT on Intel platforms when CPU feature
> self-snoop is supported. However, KVM honoring guest PAT was reverted after
> commit 9d70f3fec144 ("Revert "KVM: VMX: Always honor guest PAT on CPUs that
> support self-snoop""), due to UC access on certain Intel platforms being
> very slow [1]. Honoring guest PAT on those platforms may break some old
> guests that accidentally specify PAT as UC. Those old guests may never
> expect the slowness since KVM always forces WB previously. See [2].
> 
> So, introduce an Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT.
> KVM enables the quirk on all Intel platforms by default to avoid breaking
> old unmodifiable guests. Newer userspace can disable this quirk to turn on
> honoring guest PAT.
> 
> The quirk is only valid on Intel's platforms and is absent on AMD's
> platforms as KVM always honors guest PAT when running on AMD.
> 
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Link: https://lore.kernel.org/all/Ztl9NWCOupNfVaCA@yzhao56-desk.sh.intel.com # [1]
> Link: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com # [2]
> Message-ID: <20250224070946.31482-1-yan.y.zhao@intel.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   Documentation/virt/kvm/api.rst  | 22 +++++++++++++++++++
>   arch/x86/include/uapi/asm/kvm.h |  1 +
>   arch/x86/kvm/mmu.h              |  2 +-
>   arch/x86/kvm/mmu/mmu.c          | 11 ++++++----
>   arch/x86/kvm/svm/svm.c          |  1 +
>   arch/x86/kvm/vmx/vmx.c          | 39 +++++++++++++++++++++++++++------
>   arch/x86/kvm/x86.c              |  2 +-
>   7 files changed, 65 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 2d75edc9db4f..1f13e47a65fa 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8157,6 +8157,28 @@ KVM_X86_QUIRK_STUFF_FEATURE_MSRS    By default, at vCPU creation, KVM sets the
>                                       and 0x489), as KVM does now allow them to
>                                       be set by userspace (KVM sets them based on
>                                       guest CPUID, for safety purposes).
> +
> +KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT  By default, on Intel platforms, KVM ignores
> +                                    guest PAT and forces the effective memory
> +                                    type to WB in EPT.  The quirk is not available
> +                                    on Intel platforms which are incapable of
> +                                    safely honoring guest PAT (i.e., without CPU
> +                                    self-snoop, KVM always ignores guest PAT and
> +                                    forces effective memory type to WB).  It is
> +                                    also ignored on AMD platforms or, on Intel,
> +                                    when a VM has non-coherent DMA devices
> +                                    assigned; KVM always honors guest PAT in
> +                                    such case. The quirk is needed to avoid
> +                                    slowdowns on certain Intel Xeon platforms
> +                                    (e.g. ICX, SPR) where self-snoop feature is
> +                                    supported but UC is slow enough to cause
> +                                    issues with some older guests that use
> +                                    UC instead of WC to map the video RAM.
> +                                    Userspace can disable the quirk to honor
> +                                    guest PAT if it knows that there is no such
> +                                    guest software, for example if it does not
> +                                    expose a bochs graphics device (which is
> +                                    known to have had a buggy driver).
>   =================================== ============================================
>   
>   7.32 KVM_CAP_MAX_VCPU_ID
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 89cc7a18ef45..db55a70e173c 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -441,6 +441,7 @@ struct kvm_sync_regs {
>   #define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS	(1 << 6)
>   #define KVM_X86_QUIRK_SLOT_ZAP_ALL		(1 << 7)
>   #define KVM_X86_QUIRK_STUFF_FEATURE_MSRS	(1 << 8)
> +#define KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT	(1 << 9)
>   
>   #define KVM_STATE_NESTED_FORMAT_VMX	0
>   #define KVM_STATE_NESTED_FORMAT_SVM	1
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 47e64a3c4ce3..f999c15d8d3e 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -232,7 +232,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>   	return -(u32)fault & errcode;
>   }
>   
> -bool kvm_mmu_may_ignore_guest_pat(void);
> +bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm);
>   
>   int kvm_mmu_post_init_vm(struct kvm *kvm);
>   void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e6eb3a262f8d..bcf395d7ec53 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4663,17 +4663,20 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
>   }
>   #endif
>   
> -bool kvm_mmu_may_ignore_guest_pat(void)
> +bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm)
>   {
>   	/*
>   	 * When EPT is enabled (shadow_memtype_mask is non-zero), and the VM
>   	 * has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to
>   	 * honor the memtype from the guest's PAT so that guest accesses to
>   	 * memory that is DMA'd aren't cached against the guest's wishes.  As a
> -	 * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA,
> -	 * KVM _always_ ignores guest PAT (when EPT is enabled).
> +	 * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA.
> +	 * KVM _always_ ignores guest PAT, when EPT is enabled and when quirk
> +	 * KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is enabled or the CPU lacks the
> +	 * ability to safely honor guest PAT.
>   	 */
> -	return shadow_memtype_mask;
> +	return shadow_memtype_mask &&
> +	       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT);
>   }
>   
>   int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ebaa5a41db07..2254dbebddac 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5426,6 +5426,7 @@ static __init int svm_hardware_setup(void)
>   	 */
>   	allow_smaller_maxphyaddr = !npt_enabled;
>   
> +	kvm_caps.inapplicable_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;
>   	return 0;
>   
>   err:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 75df4caea2f7..5365efb22e96 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7599,6 +7599,33 @@ int vmx_vm_init(struct kvm *kvm)
>   	return 0;
>   }
>   
> +/*
> + * Ignore guest PAT when the CPU doesn't support self-snoop to safely honor
> + * guest PAT, or quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is turned on.  Always
> + * honor guest PAT when there's non-coherent DMA device attached.
> + *
> + * Honoring guest PAT means letting the guest control memory types.
> + * - On Intel CPUs that lack self-snoop feature, honoring guest PAT may result
> + *   in unexpected behavior. So always ignore guest PAT on those CPUs.
> + *
> + * - KVM's ABI is to trust the guest for attached non-coherent DMA devices to
> + *   function correctly (non-coherent DMA devices need the guest to flush CPU
> + *   caches properly). So honoring guest PAT to avoid breaking existing ABI.
> + *
> + * - On certain Intel CPUs (e.g. SPR, ICX), though self-snoop feature is
> + *   supported, UC is slow enough to cause issues with some older guests (e.g.
> + *   an old version of bochs driver uses ioremap() instead of ioremap_wc() to
> + *   map the video RAM, causing wayland desktop to fail to get started
> + *   correctly). To avoid breaking those old guests that rely on KVM to force
> + *   memory type to WB, only honoring guest PAT when quirk
> + *   KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is disabled.
> + */
> +static inline bool vmx_ignore_guest_pat(struct kvm *kvm)
> +{
> +	return !kvm_arch_has_noncoherent_dma(kvm) &&
> +	       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT);
> +}
> +
>   u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
>   {
>   	/*
> @@ -7608,13 +7635,8 @@ u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
>   	if (is_mmio)
>   		return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
>   
> -	/*
> -	 * Force WB and ignore guest PAT if the VM does NOT have a non-coherent
> -	 * device attached.  Letting the guest control memory types on Intel
> -	 * CPUs may result in unexpected behavior, and so KVM's ABI is to trust
> -	 * the guest to behave only as a last resort.
> -	 */
> -	if (!kvm_arch_has_noncoherent_dma(vcpu->kvm))
> +	/* Force WB if ignoring guest PAT */
> +	if (vmx_ignore_guest_pat(vcpu->kvm))
>   		return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
>   
>   	return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT);
> @@ -8506,6 +8528,9 @@ __init int vmx_hardware_setup(void)
>   
>   	kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
>   
> +	/* Must use WB if the CPU does not have self-snoop.  */
> +	if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
> +		kvm_caps.supported_quirks &= ~KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;

It seems missing the code to add KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT into 
KVM_X86_VALID_QUIRKS?

>   	kvm_caps.inapplicable_quirks = KVM_X86_QUIRK_CD_NW_CLEARED;
>   	return r;
>   }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a97e58916b6a..b221f273ec77 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13544,7 +13544,7 @@ static void kvm_noncoherent_dma_assignment_start_or_stop(struct kvm *kvm)
>   	 * (or last) non-coherent device is (un)registered to so that new SPTEs
>   	 * with the correct "ignore guest PAT" setting are created.
>   	 */
> -	if (kvm_mmu_may_ignore_guest_pat())
> +	if (kvm_mmu_may_ignore_guest_pat(kvm))
>   		kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
>   }
>   


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: x86: Allow vendor code to disable quirks
  2025-03-01  7:34 ` [PATCH 1/4] KVM: x86: Allow vendor code to disable quirks Paolo Bonzini
  2025-03-02 17:11   ` Xiaoyao Li
@ 2025-03-03  1:15   ` Yan Zhao
  2025-03-03 16:04     ` Paolo Bonzini
  1 sibling, 1 reply; 20+ messages in thread
From: Yan Zhao @ 2025-03-03  1:15 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Sat, Mar 01, 2025 at 02:34:25AM -0500, Paolo Bonzini wrote:
> In some cases, the handling of quirks is split between platform-specific
> code and generic code, or it is done entirely in generic code, but the
> relevant bug does not trigger on some platforms; for example,
> KVM_X86_QUIRK_CD_NW_CLEARED is only applicable to AMD systems.  In that
> case, allow unaffected vendor modules to disable handling of the quirk.
> 
> The quirk remains available in KVM_CAP_DISABLE_QUIRKS2, because that API
> tells userspace that KVM *knows* that some of its past behavior was bogus
> or just undesirable.  In other words, it's plausible for userspace to
> refuse to run if a quirk is not listed by KVM_CAP_DISABLE_QUIRKS2.
> 
> In kvm_check_has_quirk(), in addition to checking if a quirk is not
> explicitly disabled by the user, also verify if the quirk applies to
> the hardware.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Message-ID: <20250224070832.31394-1-yan.y.zhao@intel.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/vmx/vmx.c |  1 +
>  arch/x86/kvm/x86.c     |  1 +
>  arch/x86/kvm/x86.h     | 12 +++++++-----
>  3 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 486fbdb4365c..75df4caea2f7 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8506,6 +8506,7 @@ __init int vmx_hardware_setup(void)
>  
>  	kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
>  
> +	kvm_caps.inapplicable_quirks = KVM_X86_QUIRK_CD_NW_CLEARED;
As you mentioned, KVM_X86_QUIRK_CD_NW_CLEARED has no effect on Intel's
platforms, no matter kvm_check_has_quirk() returns true or false.

So, what's the purpose to introduce kvm_caps.inapplicable_quirks?

One concern is that since KVM_X86_QUIRK_CD_NW_CLEARED is not for Intel
platforms, it's unnatural for Intel's code to add it into the
kvm_caps.inapplicable_quirks.
If AMD introduces new quirks that apply only to its own platform in future,
they may have no idea whether it's applicable to Intel as well.

>  	return r;
>  }
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 856ceeb4fb35..fd0a44e59314 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9775,6 +9775,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>  		kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
>  		kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
>  	}
> +	kvm_caps.inapplicable_quirks = 0;
>  
>  	rdmsrl_safe(MSR_EFER, &kvm_host.efer);
>  
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 8ce6da98b5a2..9af199c8e5c8 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -34,6 +34,7 @@ struct kvm_caps {
>  	u64 supported_xcr0;
>  	u64 supported_xss;
>  	u64 supported_perf_cap;
> +	u64 inapplicable_quirks;
>  };
>  
>  struct kvm_host_values {
> @@ -354,11 +355,6 @@ static inline void kvm_register_write(struct kvm_vcpu *vcpu,
>  	return kvm_register_write_raw(vcpu, reg, val);
>  }
>  
> -static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
> -{
> -	return !(kvm->arch.disabled_quirks & quirk);
> -}
> -
>  void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
>  
>  u64 get_kvmclock_ns(struct kvm *kvm);
> @@ -394,6 +390,12 @@ extern struct kvm_host_values kvm_host;
>  
>  extern bool enable_pmu;
>  
> +static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
> +{
> +	u64 disabled_quirks = kvm_caps.inapplicable_quirks | kvm->arch.disabled_quirks;
> +	return !(disabled_quirks & quirk);
> +}
> +
>  /*
>   * Get a filtered version of KVM's supported XCR0 that strips out dynamic
>   * features for which the current process doesn't (yet) have permission to use.
> -- 
> 2.43.5
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce supported_quirks to block disabling quirks
  2025-03-01  7:34 ` [PATCH 2/4] KVM: x86: Introduce supported_quirks to block disabling quirks Paolo Bonzini
  2025-03-02 17:13   ` Xiaoyao Li
@ 2025-03-03  1:23   ` Yan Zhao
  2025-03-03 16:11     ` Paolo Bonzini
  1 sibling, 1 reply; 20+ messages in thread
From: Yan Zhao @ 2025-03-03  1:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Sat, Mar 01, 2025 at 02:34:26AM -0500, Paolo Bonzini wrote:
> From: Yan Zhao <yan.y.zhao@intel.com>
> 
> Introduce supported_quirks in kvm_caps to store platform-specific force-enabled
> quirks.  Any quirk removed from kvm_caps.supported_quirks will never be
> included in kvm->arch.disabled_quirks, and will cause the ioctl to fail if
> passed to KVM_ENABLE_CAP(KVM_CAP_DISABLE_QUIRKS2).
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Message-ID: <20250224070832.31394-1-yan.y.zhao@intel.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 7 ++++---
>  arch/x86/kvm/x86.h | 2 ++
>  2 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fd0a44e59314..a97e58916b6a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4782,7 +4782,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		r = enable_pmu ? KVM_CAP_PMU_VALID_MASK : 0;
>  		break;
>  	case KVM_CAP_DISABLE_QUIRKS2:
> -		r = KVM_X86_VALID_QUIRKS;
> +		r = kvm_caps.supported_quirks;
As the concern raised in [1], it's confusing for
KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT to be present on AMD's platforms while not
present on Intel's non-self-snoop platforms.

What about still returning KVM_X86_VALID_QUIRKS here and only having
kvm_caps.supported_quirks to filter disabled_quirks?

So, for KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT, it's still present when userspace
queries KVM_CAP_DISABLE_QUIRKS2, but it fails when userspace tries to disable
the quirk on Intel platforms without self-snoop.

Not sure if it will cause confusion to the userspace though.

Or what about introduce kvm_caps.force_enabled_quirk?
if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
        kvm_caps.force_enabled_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;


static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
{
        return !(kvm->arch.disabled_quirks & quirk) |
               (kvm_caps.force_enabled_quirks & quirk);
}

[1] https://lore.kernel.org/all/Z8UBpC76CyxCIRiU@yzhao56-desk.sh.intel.com/
>  		break;
>  	case KVM_CAP_X86_NOTIFY_VMEXIT:
>  		r = kvm_caps.has_notify_vmexit;
> @@ -6521,11 +6521,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  	switch (cap->cap) {
>  	case KVM_CAP_DISABLE_QUIRKS2:
>  		r = -EINVAL;
> -		if (cap->args[0] & ~KVM_X86_VALID_QUIRKS)
> +		if (cap->args[0] & ~kvm_caps.supported_quirks)
>  			break;
>  		fallthrough;
>  	case KVM_CAP_DISABLE_QUIRKS:
> -		kvm->arch.disabled_quirks = cap->args[0];
> +		kvm->arch.disabled_quirks = cap->args[0] & kvm_caps.supported_quirks;
Will this break the uapi of KVM_CAP_DISABLE_QUIRKS?
My understanding is that only KVM_CAP_DISABLE_QUIRKS2 filters out invalid
quirks.

>  		r = 0;
>  		break;
>  	case KVM_CAP_SPLIT_IRQCHIP: {
> @@ -9775,6 +9775,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>  		kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
>  		kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
>  	}
> +	kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS;
>  	kvm_caps.inapplicable_quirks = 0;
>  
>  	rdmsrl_safe(MSR_EFER, &kvm_host.efer);
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 9af199c8e5c8..f2672b14388c 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -34,6 +34,8 @@ struct kvm_caps {
>  	u64 supported_xcr0;
>  	u64 supported_xss;
>  	u64 supported_perf_cap;
> +
> +	u64 supported_quirks;
>  	u64 inapplicable_quirks;
>  };
>  
> -- 
> 2.43.5
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] KVM: x86: Introduce Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT
  2025-03-01  7:34 ` [PATCH 3/4] KVM: x86: Introduce Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Paolo Bonzini
  2025-03-02 17:30   ` Xiaoyao Li
@ 2025-03-03  1:25   ` Yan Zhao
  1 sibling, 0 replies; 20+ messages in thread
From: Yan Zhao @ 2025-03-03  1:25 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc, Kevin Tian

On Sat, Mar 01, 2025 at 02:34:27AM -0500, Paolo Bonzini wrote:
> From: Yan Zhao <yan.y.zhao@intel.com>
> 
> Introduce an Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT to have
> KVM ignore guest PAT when this quirk is enabled.
> 
> KVM is able to safely honor guest PAT on Intel platforms when CPU feature
> self-snoop is supported. However, KVM honoring guest PAT was reverted after
> commit 9d70f3fec144 ("Revert "KVM: VMX: Always honor guest PAT on CPUs that
> support self-snoop""), due to UC access on certain Intel platforms being
> very slow [1]. Honoring guest PAT on those platforms may break some old
> guests that accidentally specify PAT as UC. Those old guests may never
> expect the slowness since KVM always forces WB previously. See [2].
> 
> So, introduce an Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT.
> KVM enables the quirk on all Intel platforms by default to avoid breaking
> old unmodifiable guests. Newer userspace can disable this quirk to turn on
> honoring guest PAT.
> 
> The quirk is only valid on Intel's platforms and is absent on AMD's
> platforms as KVM always honors guest PAT when running on AMD.
> 
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Link: https://lore.kernel.org/all/Ztl9NWCOupNfVaCA@yzhao56-desk.sh.intel.com # [1]
> Link: https://lore.kernel.org/all/87jzfutmfc.fsf@redhat.com # [2]
> Message-ID: <20250224070946.31482-1-yan.y.zhao@intel.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  Documentation/virt/kvm/api.rst  | 22 +++++++++++++++++++
>  arch/x86/include/uapi/asm/kvm.h |  1 +
>  arch/x86/kvm/mmu.h              |  2 +-
>  arch/x86/kvm/mmu/mmu.c          | 11 ++++++----
>  arch/x86/kvm/svm/svm.c          |  1 +
>  arch/x86/kvm/vmx/vmx.c          | 39 +++++++++++++++++++++++++++------
>  arch/x86/kvm/x86.c              |  2 +-
>  7 files changed, 65 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 2d75edc9db4f..1f13e47a65fa 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8157,6 +8157,28 @@ KVM_X86_QUIRK_STUFF_FEATURE_MSRS    By default, at vCPU creation, KVM sets the
>                                      and 0x489), as KVM does now allow them to
>                                      be set by userspace (KVM sets them based on
>                                      guest CPUID, for safety purposes).
> +
> +KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT  By default, on Intel platforms, KVM ignores
> +                                    guest PAT and forces the effective memory
> +                                    type to WB in EPT.  The quirk is not available
> +                                    on Intel platforms which are incapable of
> +                                    safely honoring guest PAT (i.e., without CPU
> +                                    self-snoop, KVM always ignores guest PAT and
> +                                    forces effective memory type to WB).  It is
> +                                    also ignored on AMD platforms or, on Intel,
> +                                    when a VM has non-coherent DMA devices
> +                                    assigned; KVM always honors guest PAT in
> +                                    such case. The quirk is needed to avoid
> +                                    slowdowns on certain Intel Xeon platforms
> +                                    (e.g. ICX, SPR) where self-snoop feature is
> +                                    supported but UC is slow enough to cause
> +                                    issues with some older guests that use
> +                                    UC instead of WC to map the video RAM.
> +                                    Userspace can disable the quirk to honor
> +                                    guest PAT if it knows that there is no such
> +                                    guest software, for example if it does not
> +                                    expose a bochs graphics device (which is
> +                                    known to have had a buggy driver).
>  =================================== ============================================
>  
>  7.32 KVM_CAP_MAX_VCPU_ID
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 89cc7a18ef45..db55a70e173c 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -441,6 +441,7 @@ struct kvm_sync_regs {
>  #define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS	(1 << 6)
>  #define KVM_X86_QUIRK_SLOT_ZAP_ALL		(1 << 7)
>  #define KVM_X86_QUIRK_STUFF_FEATURE_MSRS	(1 << 8)
> +#define KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT	(1 << 9)
>  
>  #define KVM_STATE_NESTED_FORMAT_VMX	0
>  #define KVM_STATE_NESTED_FORMAT_SVM	1
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 47e64a3c4ce3..f999c15d8d3e 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -232,7 +232,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  	return -(u32)fault & errcode;
>  }
>  
> -bool kvm_mmu_may_ignore_guest_pat(void);
> +bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm);
>  
>  int kvm_mmu_post_init_vm(struct kvm *kvm);
>  void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e6eb3a262f8d..bcf395d7ec53 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4663,17 +4663,20 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
>  }
>  #endif
>  
> -bool kvm_mmu_may_ignore_guest_pat(void)
> +bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm)
>  {
>  	/*
>  	 * When EPT is enabled (shadow_memtype_mask is non-zero), and the VM
>  	 * has non-coherent DMA (DMA doesn't snoop CPU caches), KVM's ABI is to
>  	 * honor the memtype from the guest's PAT so that guest accesses to
>  	 * memory that is DMA'd aren't cached against the guest's wishes.  As a
> -	 * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA,
> -	 * KVM _always_ ignores guest PAT (when EPT is enabled).
> +	 * result, KVM _may_ ignore guest PAT, whereas without non-coherent DMA.
> +	 * KVM _always_ ignores guest PAT, when EPT is enabled and when quirk
> +	 * KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is enabled or the CPU lacks the
> +	 * ability to safely honor guest PAT.
>  	 */
> -	return shadow_memtype_mask;
> +	return shadow_memtype_mask &&
> +	       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT);
>  }
>
>  int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ebaa5a41db07..2254dbebddac 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5426,6 +5426,7 @@ static __init int svm_hardware_setup(void)
>  	 */
>  	allow_smaller_maxphyaddr = !npt_enabled;
>  
> +	kvm_caps.inapplicable_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;
Similar to KVM_X86_QUIRK_CD_NW_CLEARED for Intel,
KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT has actually no effect on AMD's platforms,
even if kvm_check_has_quirk() returns true.

I'm wondering if it's really necessary for us to introuce
kvm_caps.inapplicable_quirks.

>  	return 0;
>  
>  err:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 75df4caea2f7..5365efb22e96 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7599,6 +7599,33 @@ int vmx_vm_init(struct kvm *kvm)
>  	return 0;
>  }
>  
> +/*
> + * Ignore guest PAT when the CPU doesn't support self-snoop to safely honor
> + * guest PAT, or quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is turned on.  Always
> + * honor guest PAT when there's non-coherent DMA device attached.
> + *
> + * Honoring guest PAT means letting the guest control memory types.
> + * - On Intel CPUs that lack self-snoop feature, honoring guest PAT may result
> + *   in unexpected behavior. So always ignore guest PAT on those CPUs.
> + *
> + * - KVM's ABI is to trust the guest for attached non-coherent DMA devices to
> + *   function correctly (non-coherent DMA devices need the guest to flush CPU
> + *   caches properly). So honoring guest PAT to avoid breaking existing ABI.
> + *
> + * - On certain Intel CPUs (e.g. SPR, ICX), though self-snoop feature is
> + *   supported, UC is slow enough to cause issues with some older guests (e.g.
> + *   an old version of bochs driver uses ioremap() instead of ioremap_wc() to
> + *   map the video RAM, causing wayland desktop to fail to get started
> + *   correctly). To avoid breaking those old guests that rely on KVM to force
> + *   memory type to WB, only honoring guest PAT when quirk
> + *   KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is disabled.
> + */
> +static inline bool vmx_ignore_guest_pat(struct kvm *kvm)
> +{
> +	return !kvm_arch_has_noncoherent_dma(kvm) &&
> +	       kvm_check_has_quirk(kvm, KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT);
> +}
> +
>  u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
>  {
>  	/*
> @@ -7608,13 +7635,8 @@ u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
>  	if (is_mmio)
>  		return MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
>  
> -	/*
> -	 * Force WB and ignore guest PAT if the VM does NOT have a non-coherent
> -	 * device attached.  Letting the guest control memory types on Intel
> -	 * CPUs may result in unexpected behavior, and so KVM's ABI is to trust
> -	 * the guest to behave only as a last resort.
> -	 */
> -	if (!kvm_arch_has_noncoherent_dma(vcpu->kvm))
> +	/* Force WB if ignoring guest PAT */
> +	if (vmx_ignore_guest_pat(vcpu->kvm))
>  		return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
>  
>  	return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT);
> @@ -8506,6 +8528,9 @@ __init int vmx_hardware_setup(void)
>  
>  	kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
>  
> +	/* Must use WB if the CPU does not have self-snoop.  */
> +	if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
> +		kvm_caps.supported_quirks &= ~KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;
>  	kvm_caps.inapplicable_quirks = KVM_X86_QUIRK_CD_NW_CLEARED;
>  	return r;
>  }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a97e58916b6a..b221f273ec77 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13544,7 +13544,7 @@ static void kvm_noncoherent_dma_assignment_start_or_stop(struct kvm *kvm)
>  	 * (or last) non-coherent device is (un)registered to so that new SPTEs
>  	 * with the correct "ignore guest PAT" setting are created.
>  	 */
> -	if (kvm_mmu_may_ignore_guest_pat())
> +	if (kvm_mmu_may_ignore_guest_pat(kvm))
>  		kvm_zap_gfn_range(kvm, gpa_to_gfn(0), gpa_to_gfn(~0ULL));
>  }
>  
> -- 
> 2.43.5
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] KVM: TDX: Always honor guest PAT on TDX enabled platforms
  2025-03-01  7:34 ` [PATCH 4/4] KVM: TDX: Always honor guest PAT on TDX enabled platforms Paolo Bonzini
  2025-03-02 17:03   ` Xiaoyao Li
@ 2025-03-03  1:30   ` Yan Zhao
  2025-03-03 16:14     ` Paolo Bonzini
  1 sibling, 1 reply; 20+ messages in thread
From: Yan Zhao @ 2025-03-03  1:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Sat, Mar 01, 2025 at 02:34:28AM -0500, Paolo Bonzini wrote:
> From: Yan Zhao <yan.y.zhao@intel.com>
> 
> Always honor guest PAT in KVM-managed EPTs on TDX enabled platforms by
> making self-snoop feature a hard dependency for TDX and making quirk
> KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT not a valid quirk once TDX is enabled.
> 
> The quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT only affects memory type of
> KVM-managed EPTs. For the TDX-module-managed private EPT, memory type is
> always forced to WB now.
> 
> Honoring guest PAT in KVM-managed EPTs ensures KVM does not invoke
> kvm_zap_gfn_range() when attaching/detaching non-coherent DMA devices;
> this would cause mirrored EPTs for TDs to be zapped, as well as incorrect
> zapping of the private EPT that is managed by the TDX module.
> 
> As a new platform, TDX always comes with self-snoop feature supported and has
> no worry to break old not-well-written yet unmodifiable guests. So, simply
> force-disable the KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT quirk for TDX VMs.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> Message-ID: <20250224071039.31511-1-yan.y.zhao@intel.com>
> [Use disabled_quirks instead of supported_quirks. - Paolo]
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  arch/x86/kvm/vmx/tdx.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index b6f6f6e2f02e..4450fd99cb4c 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -624,6 +624,7 @@ int tdx_vm_init(struct kvm *kvm)
>  
>  	kvm->arch.has_protected_state = true;
>  	kvm->arch.has_private_mem = true;
> +	kvm->arch.disabled_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;
Though the quirk is disabled by default in KVM in tdx_vm_init() for TDs, the
kvm->arch.disabled_quirks can be overwritten by a userspace specified value in
kvm_vm_ioctl_enable_cap().
"kvm->arch.disabled_quirks = cap->args[0] & kvm_caps.supported_quirks;"

So, when an old userspace tries to disable other quirks on this new KVM, it may
accidentally turn KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT into enabled for TDs, which
would cause SEPT being zapped when (de)attaching non-coherent devices.

Could we force KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT to be disabled for TDs?

e.g.

tdx_vm_init
   kvm->arch.always_disabled_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;

static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
{
        WARN_ON_ONCE(kvm->arch.always_disabled_quirk & kvm_caps.force_enabled_quirks);

        u64 disabled_quirks = kvm->arch.always_disabled_quirk | kvm->arch.disabled_quirks;
        return !(disabled_quirks & quirk) |
               (kvm_caps.force_enabled_quirks & quirk);
}

>  
>  	/*
>  	 * Because guest TD is protected, VMM can't parse the instruction in TD.
> @@ -3470,6 +3471,11 @@ int __init tdx_bringup(void)
>  		goto success_disable_tdx;
>  	}
>  
> +	if (!cpu_feature_enabled(X86_FEATURE_SELFSNOOP)) {
> +		pr_err("Self-snoop is required for TDX\n");
> +		goto success_disable_tdx;
> +	}
> +
>  	if (!cpu_feature_enabled(X86_FEATURE_TDX_HOST_PLATFORM)) {
>  		pr_err("tdx: no TDX private KeyIDs available\n");
>  		goto success_disable_tdx;
> -- 
> 2.43.5
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 0/4] KVM: x86: Introduce quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT
  2025-03-01  7:34 [PATCH v2 0/4] KVM: x86: Introduce quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Paolo Bonzini
                   ` (3 preceding siblings ...)
  2025-03-01  7:34 ` [PATCH 4/4] KVM: TDX: Always honor guest PAT on TDX enabled platforms Paolo Bonzini
@ 2025-03-03  1:47 ` Yan Zhao
  4 siblings, 0 replies; 20+ messages in thread
From: Yan Zhao @ 2025-03-03  1:47 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Sat, Mar 01, 2025 at 02:34:24AM -0500, Paolo Bonzini wrote:
> This series is my evolution of Yan's patches at
> https://patchew.org/linux/20250224070716.31360-1-yan.y.zhao@intel.com/.
Hi Paolo,
Thanks for helping refining the patches!

Here's a summary of my comments:
1. This is confusing for KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT to be present
   on AMD's platforms while not present on Intel's non-self-snoop platforms.
   (patch 2)

2. Could we make KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT and
   KVM_X86_QUIRK_SLOT_ZAP_ALL always-disabled for TDs? (patch 4)

3. kvm_caps.inapplicable_quirks may not be necessary. (patches 1/3)

Thanks
Yan
> 
> The implementation of the quirk is unchanged, but the concepts in kvm_caps
> are a bit different.  In particular:
> 
> - if a quirk is not applicable to some hardware, it is still included
>   in KVM_CAP_DISABLE_QUIRKS2.  This way userspace knows that KVM is
>   *aware* of a particular issue - even if disabling it has no effect
>   because the quirk is not a problem on a specific hardware, userspace
>   may want to know that it can rely on the problematic behavior not
>   being present.  Therefore, KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT is
>   simply auto-disabled on TDX machines.
>
> - if instead a quirk cannot be disabled due to limitations, for example
>   KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT if self-snoop is not present on
>   the CPU, the quirk is removed completely from kvm_caps.supported_quirks
>   and therefore from KVM_CAP_DISABLE_QUIRKS2.
> 
> This series does not introduce a way to query always-disabled quirks,
> which could be for example KVM_CAP_DISABLED_QUIRKS.  This could be
> added if we wanted for example to get rid of hypercall patching; it's
> a trivial addition.

> 
> Paolo Bonzini (1):
>   KVM: x86: Allow vendor code to disable quirks
> 
> Yan Zhao (3):
>   KVM: x86: Introduce supported_quirks to block disabling quirks
>   KVM: x86: Introduce Intel specific quirk
>     KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT
>   KVM: TDX: Always honor guest PAT on TDX enabled platforms
> 
>  Documentation/virt/kvm/api.rst  | 22 ++++++++++++++++++
>  arch/x86/include/uapi/asm/kvm.h |  1 +
>  arch/x86/kvm/mmu.h              |  2 +-
>  arch/x86/kvm/mmu/mmu.c          | 11 +++++----
>  arch/x86/kvm/svm/svm.c          |  1 +
>  arch/x86/kvm/vmx/tdx.c          |  6 +++++
>  arch/x86/kvm/vmx/vmx.c          | 40 +++++++++++++++++++++++++++------
>  arch/x86/kvm/x86.c              | 10 +++++----
>  arch/x86/kvm/x86.h              | 14 +++++++-----
>  9 files changed, 86 insertions(+), 21 deletions(-)
> 
> -- 
> 2.43.5
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: x86: Allow vendor code to disable quirks
  2025-03-03  1:15   ` Yan Zhao
@ 2025-03-03 16:04     ` Paolo Bonzini
  2025-03-04  6:46       ` Yan Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2025-03-03 16:04 UTC (permalink / raw)
  To: Yan Zhao; +Cc: linux-kernel, kvm, seanjc

On 3/3/25 02:15, Yan Zhao wrote:
> On Sat, Mar 01, 2025 at 02:34:25AM -0500, Paolo Bonzini wrote:
>> In some cases, the handling of quirks is split between platform-specific
>> code and generic code, or it is done entirely in generic code, but the
>> relevant bug does not trigger on some platforms; for example,
>> KVM_X86_QUIRK_CD_NW_CLEARED is only applicable to AMD systems.  In that
>> case, allow unaffected vendor modules to disable handling of the quirk.
>>
>> The quirk remains available in KVM_CAP_DISABLE_QUIRKS2, because that API
>> tells userspace that KVM *knows* that some of its past behavior was bogus
>> or just undesirable.  In other words, it's plausible for userspace to
>> refuse to run if a quirk is not listed by KVM_CAP_DISABLE_QUIRKS2.
>>
>> In kvm_check_has_quirk(), in addition to checking if a quirk is not
>> explicitly disabled by the user, also verify if the quirk applies to
>> the hardware.
>>
>> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
>> Message-ID: <20250224070832.31394-1-yan.y.zhao@intel.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>   arch/x86/kvm/vmx/vmx.c |  1 +
>>   arch/x86/kvm/x86.c     |  1 +
>>   arch/x86/kvm/x86.h     | 12 +++++++-----
>>   3 files changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 486fbdb4365c..75df4caea2f7 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -8506,6 +8506,7 @@ __init int vmx_hardware_setup(void)
>>   
>>   	kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
>>   
>> +	kvm_caps.inapplicable_quirks = KVM_X86_QUIRK_CD_NW_CLEARED;
> 
> As you mentioned, KVM_X86_QUIRK_CD_NW_CLEARED has no effect on Intel's
> platforms, no matter kvm_check_has_quirk() returns true or false.
> So, what's the purpose to introduce kvm_caps.inapplicable_quirks?

The purpose is to later mark IGNORE_GUEST_PAT as inapplicable, so that 
the relevant code does not run on AMD.  However you have a point here:

> One concern is that since KVM_X86_QUIRK_CD_NW_CLEARED is not for Intel
> platforms, it's unnatural for Intel's code to add it into the
> kvm_caps.inapplicable_quirks.

So let's instead have kvm-amd.ko clear it from inapplicable_quirks.  And 
likewise kvm-intel.ko can clear IGNORE_GUEST_PAT.

Paolo


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce supported_quirks to block disabling quirks
  2025-03-03  1:23   ` Yan Zhao
@ 2025-03-03 16:11     ` Paolo Bonzini
  2025-03-04  4:21       ` Yan Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2025-03-03 16:11 UTC (permalink / raw)
  To: Yan Zhao; +Cc: linux-kernel, kvm, seanjc

On 3/3/25 02:23, Yan Zhao wrote:
> On Sat, Mar 01, 2025 at 02:34:26AM -0500, Paolo Bonzini wrote:
>> From: Yan Zhao <yan.y.zhao@intel.com>
>>
>> Introduce supported_quirks in kvm_caps to store platform-specific force-enabled
>> quirks.  Any quirk removed from kvm_caps.supported_quirks will never be
>> included in kvm->arch.disabled_quirks, and will cause the ioctl to fail if
>> passed to KVM_ENABLE_CAP(KVM_CAP_DISABLE_QUIRKS2).
>>
>> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
>> Message-ID: <20250224070832.31394-1-yan.y.zhao@intel.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>   arch/x86/kvm/x86.c | 7 ++++---
>>   arch/x86/kvm/x86.h | 2 ++
>>   2 files changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index fd0a44e59314..a97e58916b6a 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4782,7 +4782,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>   		r = enable_pmu ? KVM_CAP_PMU_VALID_MASK : 0;
>>   		break;
>>   	case KVM_CAP_DISABLE_QUIRKS2:
>> -		r = KVM_X86_VALID_QUIRKS;
>> +		r = kvm_caps.supported_quirks;
>
> As the concern raised in [1], it's confusing for
> KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT to be present on AMD's platforms while not
> present on Intel's non-self-snoop platforms.

To make it less confusing, let's rename it to 
KVM_X86_QUIRK_IGNORE_GUEST_PAT.  So if userspace wants to say "I don't 
want KVM to ignore guest's PAT!", it can do so easily:

- it can check unconditionally that the quirk is included in 
KVM_CAP_DISABLE_QUIRKS2, and it will pass on both Intel with self-snoop 
with AMD;

- it can pass it unconditionally to KVM_X86_QUIRK_IGNORE_GUEST_PAT, 
knowing that PAT will be honored.

But KVM_CHECK_EXTENSION(KVM_CAP_DISABLE_QUIRKS2) will *not* include the 
quirk on Intel without self-snoop, which lets userspace detect the 
condition and raise an error.  This is better than introducing a new 
case in the API "the bit is returned by KVM_CHECK_EXTENSION, but 
rejected by KVM_ENABLE_CAP".  Such a new case would inevitably lead to 
KVM_CAP_DISABLE_QUIRKS3. :)

> Or what about introduce kvm_caps.force_enabled_quirk?
>
> if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
>          kvm_caps.force_enabled_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;

That would also make it harder for userspace to understand what's going on.

> [1] https://lore.kernel.org/all/Z8UBpC76CyxCIRiU@yzhao56-desk.sh.intel.com/
>>   		break;
>>   	case KVM_CAP_X86_NOTIFY_VMEXIT:
>>   		r = kvm_caps.has_notify_vmexit;
>> @@ -6521,11 +6521,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>>   	switch (cap->cap) {
>>   	case KVM_CAP_DISABLE_QUIRKS2:
>>   		r = -EINVAL;
>> -		if (cap->args[0] & ~KVM_X86_VALID_QUIRKS)
>> +		if (cap->args[0] & ~kvm_caps.supported_quirks)
>>   			break;
>>   		fallthrough;
>>   	case KVM_CAP_DISABLE_QUIRKS:
>> -		kvm->arch.disabled_quirks = cap->args[0];
>> +		kvm->arch.disabled_quirks = cap->args[0] & kvm_caps.supported_quirks;
>
> Will this break the uapi of KVM_CAP_DISABLE_QUIRKS?
> My understanding is that only KVM_CAP_DISABLE_QUIRKS2 filters out invalid
> quirks.

The difference between KVM_CAP_DISABLE_QUIRKS and 
KVM_CAP_DISABLE_QUIRKS2 is only that invalid values do not cause an 
error; but userspace cannot know what is in kvm->arch.disabled_quirks, 
so KVM can change the value that is stored there.

Paolo


>>   		r = 0;
>>   		break;
>>   	case KVM_CAP_SPLIT_IRQCHIP: {
>> @@ -9775,6 +9775,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>>   		kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
>>   		kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
>>   	}
>> +	kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS;
>>   	kvm_caps.inapplicable_quirks = 0;
>>   
>>   	rdmsrl_safe(MSR_EFER, &kvm_host.efer);
>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>> index 9af199c8e5c8..f2672b14388c 100644
>> --- a/arch/x86/kvm/x86.h
>> +++ b/arch/x86/kvm/x86.h
>> @@ -34,6 +34,8 @@ struct kvm_caps {
>>   	u64 supported_xcr0;
>>   	u64 supported_xss;
>>   	u64 supported_perf_cap;
>> +
>> +	u64 supported_quirks;
>>   	u64 inapplicable_quirks;
>>   };
>>   
>> -- 
>> 2.43.5
>>
>>
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] KVM: TDX: Always honor guest PAT on TDX enabled platforms
  2025-03-03  1:30   ` Yan Zhao
@ 2025-03-03 16:14     ` Paolo Bonzini
  2025-03-04  6:20       ` Yan Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2025-03-03 16:14 UTC (permalink / raw)
  To: Yan Zhao; +Cc: linux-kernel, kvm, seanjc

On 3/3/25 02:30, Yan Zhao wrote:
>>   	kvm->arch.has_protected_state = true;
>>   	kvm->arch.has_private_mem = true;
>> +	kvm->arch.disabled_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;
> Though the quirk is disabled by default in KVM in tdx_vm_init() for TDs, the
> kvm->arch.disabled_quirks can be overwritten by a userspace specified value in
> kvm_vm_ioctl_enable_cap().
> "kvm->arch.disabled_quirks = cap->args[0] & kvm_caps.supported_quirks;"
> 
> So, when an old userspace tries to disable other quirks on this new KVM, it may
> accidentally turn KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT into enabled for TDs, which
> would cause SEPT being zapped when (de)attaching non-coherent devices.

Yeah, sorry about that - Xiaoyao also pointed it out and I should have 
noticed it---or marked the patches as RFC which they were.

> Could we force KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT to be disabled for TDs?
> 
> e.g.
> 
> tdx_vm_init
>     kvm->arch.always_disabled_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;
> 
> static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
> {
>          WARN_ON_ONCE(kvm->arch.always_disabled_quirk & kvm_caps.force_enabled_quirks);
> 
>          u64 disabled_quirks = kvm->arch.always_disabled_quirk | kvm->arch.disabled_quirks;
>          return !(disabled_quirks & quirk) |
>                 (kvm_caps.force_enabled_quirks & quirk);
> }

We can change KVM_ENABLE_CAP(KVM_X86_DISABLE_QUIRKS), as well as 
QUIRKS2, to use "|=" instead of "=".

While this is technically a change in the API, the current 
implementation is just awful and I hope that no one is relying on it! 
This way, the "always_disabled_quirks" are not needed.

If the "|=" idea doesn't work out I agree that 
kvm->arch.always_disabled_quirk is needed.

Sending v3 shortly...

Paolo


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] KVM: x86: Introduce supported_quirks to block disabling quirks
  2025-03-03 16:11     ` Paolo Bonzini
@ 2025-03-04  4:21       ` Yan Zhao
  0 siblings, 0 replies; 20+ messages in thread
From: Yan Zhao @ 2025-03-04  4:21 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Mon, Mar 03, 2025 at 05:11:31PM +0100, Paolo Bonzini wrote:
> On 3/3/25 02:23, Yan Zhao wrote:
> > On Sat, Mar 01, 2025 at 02:34:26AM -0500, Paolo Bonzini wrote:
> > > From: Yan Zhao <yan.y.zhao@intel.com>
> > > 
> > > Introduce supported_quirks in kvm_caps to store platform-specific force-enabled
> > > quirks.  Any quirk removed from kvm_caps.supported_quirks will never be
> > > included in kvm->arch.disabled_quirks, and will cause the ioctl to fail if
> > > passed to KVM_ENABLE_CAP(KVM_CAP_DISABLE_QUIRKS2).
> > > 
> > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > > Message-ID: <20250224070832.31394-1-yan.y.zhao@intel.com>
> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > ---
> > >   arch/x86/kvm/x86.c | 7 ++++---
> > >   arch/x86/kvm/x86.h | 2 ++
> > >   2 files changed, 6 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index fd0a44e59314..a97e58916b6a 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -4782,7 +4782,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> > >   		r = enable_pmu ? KVM_CAP_PMU_VALID_MASK : 0;
> > >   		break;
> > >   	case KVM_CAP_DISABLE_QUIRKS2:
> > > -		r = KVM_X86_VALID_QUIRKS;
> > > +		r = kvm_caps.supported_quirks;
> > 
> > As the concern raised in [1], it's confusing for
> > KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT to be present on AMD's platforms while not
> > present on Intel's non-self-snoop platforms.
> 
> To make it less confusing, let's rename it to
> KVM_X86_QUIRK_IGNORE_GUEST_PAT.  So if userspace wants to say "I don't want
Hmm, it looks that quirk IGNORE_GUEST_PAT is effectively always enabled on Intel
platforms without enabling EPT.

And kvm_mmu_may_ignore_guest_pat() does not care quirk IGNORE_GUEST_PAT on AMD
or on Intel when enable_ept is 0.

bool kvm_mmu_may_ignore_guest_pat(struct kvm *kvm)                               
{                                                                                
        return shadow_memtype_mask &&                                            
               kvm_check_has_quirk(kvm, KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT);     
}                                                                                

After renaming the quirk, should we also hide quirk IGNORE_GUEST_PAT from being
exposed in KVM_CAP_DISABLE_QUIRKS2 when enable_ept is 0?
However, doing so will complicate kvm_noncoherent_dma_assignment_start_or_stop().


> KVM to ignore guest's PAT!", it can do so easily:
> 
> - it can check unconditionally that the quirk is included in
> KVM_CAP_DISABLE_QUIRKS2, and it will pass on both Intel with self-snoop with
> AMD;
So, when userspace finds the quirk IGNORE_GUEST_PAT is exposed in
KVM_CAP_DISABLE_QUIRKS2, it cannot treat this as an implication that KVM is
currently ignoring guest PAT, but it only means that this is a new KVM with
quirk IGNORE_GUEST_PAT taken into account.

> - it can pass it unconditionally to KVM_X86_QUIRK_IGNORE_GUEST_PAT, knowing
> that PAT will be honored.
For AMD, since userspace does not explicitly disable quirk IGNORE_GUEST_PAT, you
introduced kvm_caps.inapplicable_quirks to allow IGNORE_GUEST_PAT to be listed
in KVM_CAP_DISABLED_QUIRKS.

From KVM_CAP_DISABLED_QUIRKS, the userspace knows that honing guest PAT is
performed on AMD.

Is this understanding correct? 

> But KVM_CHECK_EXTENSION(KVM_CAP_DISABLE_QUIRKS2) will *not* include the
> quirk on Intel without self-snoop, which lets userspace detect the condition
> and raise an error.  This is better than introducing a new case in the API
> "the bit is returned by KVM_CHECK_EXTENSION, but rejected by
> KVM_ENABLE_CAP".  Such a new case would inevitably lead to
> KVM_CAP_DISABLE_QUIRKS3. :)
Agreed. Thanks for the explanation:)

> 
> > Or what about introduce kvm_caps.force_enabled_quirk?
> > 
> > if (!static_cpu_has(X86_FEATURE_SELFSNOOP))
> >          kvm_caps.force_enabled_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;
> 
> That would also make it harder for userspace to understand what's going on.
Right.

> > [1] https://lore.kernel.org/all/Z8UBpC76CyxCIRiU@yzhao56-desk.sh.intel.com/
> > >   		break;
> > >   	case KVM_CAP_X86_NOTIFY_VMEXIT:
> > >   		r = kvm_caps.has_notify_vmexit;
> > > @@ -6521,11 +6521,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> > >   	switch (cap->cap) {
> > >   	case KVM_CAP_DISABLE_QUIRKS2:
> > >   		r = -EINVAL;
> > > -		if (cap->args[0] & ~KVM_X86_VALID_QUIRKS)
> > > +		if (cap->args[0] & ~kvm_caps.supported_quirks)
> > >   			break;
> > >   		fallthrough;
> > >   	case KVM_CAP_DISABLE_QUIRKS:
> > > -		kvm->arch.disabled_quirks = cap->args[0];
> > > +		kvm->arch.disabled_quirks = cap->args[0] & kvm_caps.supported_quirks;
> > 
> > Will this break the uapi of KVM_CAP_DISABLE_QUIRKS?
> > My understanding is that only KVM_CAP_DISABLE_QUIRKS2 filters out invalid
> > quirks.
> 
> The difference between KVM_CAP_DISABLE_QUIRKS and KVM_CAP_DISABLE_QUIRKS2 is
> only that invalid values do not cause an error; but userspace cannot know
> what is in kvm->arch.disabled_quirks, so KVM can change the value that is
> stored there.
Ah, it makes sense.

Thanks
Yan

> 
> > >   		r = 0;
> > >   		break;
> > >   	case KVM_CAP_SPLIT_IRQCHIP: {
> > > @@ -9775,6 +9775,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
> > >   		kvm_host.xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
> > >   		kvm_caps.supported_xcr0 = kvm_host.xcr0 & KVM_SUPPORTED_XCR0;
> > >   	}
> > > +	kvm_caps.supported_quirks = KVM_X86_VALID_QUIRKS;
> > >   	kvm_caps.inapplicable_quirks = 0;
> > >   	rdmsrl_safe(MSR_EFER, &kvm_host.efer);
> > > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> > > index 9af199c8e5c8..f2672b14388c 100644
> > > --- a/arch/x86/kvm/x86.h
> > > +++ b/arch/x86/kvm/x86.h
> > > @@ -34,6 +34,8 @@ struct kvm_caps {
> > >   	u64 supported_xcr0;
> > >   	u64 supported_xss;
> > >   	u64 supported_perf_cap;
> > > +
> > > +	u64 supported_quirks;
> > >   	u64 inapplicable_quirks;
> > >   };
> > > -- 
> > > 2.43.5
> > > 
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] KVM: TDX: Always honor guest PAT on TDX enabled platforms
  2025-03-03 16:14     ` Paolo Bonzini
@ 2025-03-04  6:20       ` Yan Zhao
  0 siblings, 0 replies; 20+ messages in thread
From: Yan Zhao @ 2025-03-04  6:20 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Mon, Mar 03, 2025 at 05:14:55PM +0100, Paolo Bonzini wrote:
> On 3/3/25 02:30, Yan Zhao wrote:
> > >   	kvm->arch.has_protected_state = true;
> > >   	kvm->arch.has_private_mem = true;
> > > +	kvm->arch.disabled_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;
> > Though the quirk is disabled by default in KVM in tdx_vm_init() for TDs, the
> > kvm->arch.disabled_quirks can be overwritten by a userspace specified value in
> > kvm_vm_ioctl_enable_cap().
> > "kvm->arch.disabled_quirks = cap->args[0] & kvm_caps.supported_quirks;"
> > 
> > So, when an old userspace tries to disable other quirks on this new KVM, it may
> > accidentally turn KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT into enabled for TDs, which
> > would cause SEPT being zapped when (de)attaching non-coherent devices.
> 
> Yeah, sorry about that - Xiaoyao also pointed it out and I should have
> noticed it---or marked the patches as RFC which they were.
> 
> > Could we force KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT to be disabled for TDs?
> > 
> > e.g.
> > 
> > tdx_vm_init
> >     kvm->arch.always_disabled_quirks |= KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT;
> > 
> > static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
> > {
> >          WARN_ON_ONCE(kvm->arch.always_disabled_quirk & kvm_caps.force_enabled_quirks);
> > 
> >          u64 disabled_quirks = kvm->arch.always_disabled_quirk | kvm->arch.disabled_quirks;
> >          return !(disabled_quirks & quirk) |
> >                 (kvm_caps.force_enabled_quirks & quirk);
> > }
> 
> We can change KVM_ENABLE_CAP(KVM_X86_DISABLE_QUIRKS), as well as QUIRKS2, to
> use "|=" instead of "=".
> 
> While this is technically a change in the API, the current implementation is
> just awful and I hope that no one is relying on it! This way, the
I think QEMU is not relying on it.

I considered making this change while writing the quirk SLOT_ZAP_ALL but gave it
up, thinking it might allow users to re-enable a quirk later on.

I'm glad you also see it as a bug:)

> "always_disabled_quirks" are not needed.
> 
> If the "|=" idea doesn't work out I agree that
> kvm->arch.always_disabled_quirk is needed.
> 
> Sending v3 shortly...
Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] KVM: x86: Allow vendor code to disable quirks
  2025-03-03 16:04     ` Paolo Bonzini
@ 2025-03-04  6:46       ` Yan Zhao
  0 siblings, 0 replies; 20+ messages in thread
From: Yan Zhao @ 2025-03-04  6:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, seanjc

On Mon, Mar 03, 2025 at 05:04:40PM +0100, Paolo Bonzini wrote:
> On 3/3/25 02:15, Yan Zhao wrote:
> > On Sat, Mar 01, 2025 at 02:34:25AM -0500, Paolo Bonzini wrote:
> > > In some cases, the handling of quirks is split between platform-specific
> > > code and generic code, or it is done entirely in generic code, but the
> > > relevant bug does not trigger on some platforms; for example,
> > > KVM_X86_QUIRK_CD_NW_CLEARED is only applicable to AMD systems.  In that
> > > case, allow unaffected vendor modules to disable handling of the quirk.
> > > 
> > > The quirk remains available in KVM_CAP_DISABLE_QUIRKS2, because that API
> > > tells userspace that KVM *knows* that some of its past behavior was bogus
> > > or just undesirable.  In other words, it's plausible for userspace to
> > > refuse to run if a quirk is not listed by KVM_CAP_DISABLE_QUIRKS2.
> > > 
> > > In kvm_check_has_quirk(), in addition to checking if a quirk is not
> > > explicitly disabled by the user, also verify if the quirk applies to
> > > the hardware.
> > > 
> > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > > Message-ID: <20250224070832.31394-1-yan.y.zhao@intel.com>
> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > ---
> > >   arch/x86/kvm/vmx/vmx.c |  1 +
> > >   arch/x86/kvm/x86.c     |  1 +
> > >   arch/x86/kvm/x86.h     | 12 +++++++-----
> > >   3 files changed, 9 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 486fbdb4365c..75df4caea2f7 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -8506,6 +8506,7 @@ __init int vmx_hardware_setup(void)
> > >   	kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
> > > +	kvm_caps.inapplicable_quirks = KVM_X86_QUIRK_CD_NW_CLEARED;
> > 
> > As you mentioned, KVM_X86_QUIRK_CD_NW_CLEARED has no effect on Intel's
> > platforms, no matter kvm_check_has_quirk() returns true or false.
> > So, what's the purpose to introduce kvm_caps.inapplicable_quirks?
> 
> The purpose is to later mark IGNORE_GUEST_PAT as inapplicable, so that the
> relevant code does not run on AMD.  However you have a point here:

Or naming it kvm_caps.platform_disabled_quirks?
> 
> > One concern is that since KVM_X86_QUIRK_CD_NW_CLEARED is not for Intel
> > platforms, it's unnatural for Intel's code to add it into the
> > kvm_caps.inapplicable_quirks.
> 
> So let's instead have kvm-amd.ko clear it from inapplicable_quirks.  And
> likewise kvm-intel.ko can clear IGNORE_GUEST_PAT.
Sounds good.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-03-04  6:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-01  7:34 [PATCH v2 0/4] KVM: x86: Introduce quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Paolo Bonzini
2025-03-01  7:34 ` [PATCH 1/4] KVM: x86: Allow vendor code to disable quirks Paolo Bonzini
2025-03-02 17:11   ` Xiaoyao Li
2025-03-03  1:15   ` Yan Zhao
2025-03-03 16:04     ` Paolo Bonzini
2025-03-04  6:46       ` Yan Zhao
2025-03-01  7:34 ` [PATCH 2/4] KVM: x86: Introduce supported_quirks to block disabling quirks Paolo Bonzini
2025-03-02 17:13   ` Xiaoyao Li
2025-03-03  1:23   ` Yan Zhao
2025-03-03 16:11     ` Paolo Bonzini
2025-03-04  4:21       ` Yan Zhao
2025-03-01  7:34 ` [PATCH 3/4] KVM: x86: Introduce Intel specific quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Paolo Bonzini
2025-03-02 17:30   ` Xiaoyao Li
2025-03-03  1:25   ` Yan Zhao
2025-03-01  7:34 ` [PATCH 4/4] KVM: TDX: Always honor guest PAT on TDX enabled platforms Paolo Bonzini
2025-03-02 17:03   ` Xiaoyao Li
2025-03-03  1:30   ` Yan Zhao
2025-03-03 16:14     ` Paolo Bonzini
2025-03-04  6:20       ` Yan Zhao
2025-03-03  1:47 ` [PATCH v2 0/4] KVM: x86: Introduce quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT Yan Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox