* [PATCH v3 0/4] KVM: VMX: Unify L1D flush for L1TF
@ 2025-10-16 20:04 Sean Christopherson
2025-10-16 20:04 ` [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped Sean Christopherson
` (3 more replies)
0 siblings, 4 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-10-16 20:04 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Pawan Gupta, Brendan Jackman
Unify the L1D cache flushing done to mitigate L1TF under the per-CPU
variable, as the per-vCPU variable has been superfluous for quite some
time.
Patch 1 fixes a bug (I think it's a bug?) I found when poking around the code.
If L1D flushes are conditional and KVM skips an L1D flush on VM-Enter, then
arguably KVM should flush CPU buffers based on other mitigations.
Patches 2-3 bury the L1TF L1D flushing under CONFIG_CPU_MITIGATIONS, partly
because it's absurd that KVM doesn't honor CONFIG_CPU_MITIGATIONS for that
case, partly because it simplifies unifying the tracking code (helps obviate
the need for a stub).
Patch 4 is Brendan's patch and the main goal of the mini-series.
v3:
- Put the "raw" variant in KVM, dress it up with KVM's "request" terminology,
and add a comment explaining why _KVM_ knows its usage doesn't need to
disable virtualization.
- Add the prep patches.
v2:
- https://lore.kernel.org/all/20251015-b4-l1tf-percpu-v2-1-6d7a8d3d40e9@google.com
- Moved the bit back to irq_stat
- Fixed DEBUG_PREEMPT issues by adding a _raw variant
v1: https://lore.kernel.org/r/20251013-b4-l1tf-percpu-v1-1-d65c5366ea1a@google.com
Brendan Jackman (1):
KVM: x86: Unify L1TF flushing under per-CPU variable
Sean Christopherson (3):
KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped
KVM: VMX: Bundle all L1 data cache flush mitigation code together
KVM: VMX: Disable L1TF L1 data cache flush if CONFIG_CPU_MITIGATIONS=n
arch/x86/include/asm/hardirq.h | 4 +-
arch/x86/include/asm/kvm_host.h | 3 -
arch/x86/kvm/mmu/mmu.c | 2 +-
arch/x86/kvm/vmx/nested.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 222 ++++++++++++++++++--------------
arch/x86/kvm/x86.c | 6 +-
arch/x86/kvm/x86.h | 14 ++
7 files changed, 144 insertions(+), 109 deletions(-)
base-commit: f222788458c8a7753d43befef2769cd282dc008e
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped
2025-10-16 20:04 [PATCH v3 0/4] KVM: VMX: Unify L1D flush for L1TF Sean Christopherson
@ 2025-10-16 20:04 ` Sean Christopherson
2025-10-21 13:34 ` Brendan Jackman
2025-10-21 23:18 ` Pawan Gupta
2025-10-16 20:04 ` [PATCH v3 2/4] KVM: VMX: Bundle all L1 data cache flush mitigation code together Sean Christopherson
` (2 subsequent siblings)
3 siblings, 2 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-10-16 20:04 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Pawan Gupta, Brendan Jackman
If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to
mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g.
because none of the "heavy" paths that trigger an L1D flush were tripped
since the last VM-Enter.
Note, the flaw goes back to the introduction of the MDS mitigation. The
MDS mitigation was inadvertently fixed by commit 43fb862de8f6 ("KVM/VMX:
Move VERW closer to VMentry for MDS mitigation"), but previous kernels
that flush CPU buffers in vmx_vcpu_enter_exit() are affected.
Fixes: 650b68a0622f ("x86/kvm/vmx: Add MDS protection when L1D Flush is not active")
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/vmx.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f87c216d976d..ce556d5dc39b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6663,7 +6663,7 @@ int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
* information but as all relevant affected CPUs have 32KiB L1D cache size
* there is no point in doing so.
*/
-static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
+static noinstr bool vmx_l1d_flush(struct kvm_vcpu *vcpu)
{
int size = PAGE_SIZE << L1D_CACHE_ORDER;
@@ -6691,14 +6691,14 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
kvm_clear_cpu_l1tf_flush_l1d();
if (!flush_l1d)
- return;
+ return false;
}
vcpu->stat.l1d_flush++;
if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
native_wrmsrq(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
- return;
+ return true;
}
asm volatile(
@@ -6722,6 +6722,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
:: [flush_pages] "r" (vmx_l1d_flush_pages),
[size] "r" (size)
: "eax", "ebx", "ecx", "edx");
+ return true;
}
void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
@@ -7330,8 +7331,9 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
* and is affected by MMIO Stale Data. In such cases mitigation in only
* needed against an MMIO capable guest.
*/
- if (static_branch_unlikely(&vmx_l1d_should_flush))
- vmx_l1d_flush(vcpu);
+ if (static_branch_unlikely(&vmx_l1d_should_flush) &&
+ vmx_l1d_flush(vcpu))
+ ;
else if (static_branch_unlikely(&cpu_buf_vm_clear) &&
(flags & VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO))
x86_clear_cpu_buffers();
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v3 2/4] KVM: VMX: Bundle all L1 data cache flush mitigation code together
2025-10-16 20:04 [PATCH v3 0/4] KVM: VMX: Unify L1D flush for L1TF Sean Christopherson
2025-10-16 20:04 ` [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped Sean Christopherson
@ 2025-10-16 20:04 ` Sean Christopherson
2025-10-21 13:38 ` Brendan Jackman
2025-10-16 20:04 ` [PATCH v3 3/4] KVM: VMX: Disable L1TF L1 data cache flush if CONFIG_CPU_MITIGATIONS=n Sean Christopherson
2025-10-16 20:04 ` [PATCH v3 4/4] KVM: x86: Unify L1TF flushing under per-CPU variable Sean Christopherson
3 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-10-16 20:04 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Pawan Gupta, Brendan Jackman
Move vmx_l1d_flush(), vmx_cleanup_l1d_flush(), and the vmentry_l1d_flush
param code up in vmx.c so that all of the L1 data cache flushing code is
bundled together. This will allow conditioning the mitigation code on
CONFIG_CPU_MITIGATIONS=y with minimal #ifdefs.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/vmx.c | 176 ++++++++++++++++++++---------------------
1 file changed, 88 insertions(+), 88 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ce556d5dc39b..cd8ae1b2ae55 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -302,6 +302,16 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
return 0;
}
+static void vmx_cleanup_l1d_flush(void)
+{
+ if (vmx_l1d_flush_pages) {
+ free_pages((unsigned long)vmx_l1d_flush_pages, L1D_CACHE_ORDER);
+ vmx_l1d_flush_pages = NULL;
+ }
+ /* Restore state so sysfs ignores VMX */
+ l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO;
+}
+
static int vmentry_l1d_flush_parse(const char *s)
{
unsigned int i;
@@ -352,6 +362,84 @@ static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp)
return sysfs_emit(s, "%s\n", vmentry_l1d_param[l1tf_vmx_mitigation].option);
}
+/*
+ * Software based L1D cache flush which is used when microcode providing
+ * the cache control MSR is not loaded.
+ *
+ * The L1D cache is 32 KiB on Nehalem and later microarchitectures, but to
+ * flush it is required to read in 64 KiB because the replacement algorithm
+ * is not exactly LRU. This could be sized at runtime via topology
+ * information but as all relevant affected CPUs have 32KiB L1D cache size
+ * there is no point in doing so.
+ */
+static noinstr bool vmx_l1d_flush(struct kvm_vcpu *vcpu)
+{
+ int size = PAGE_SIZE << L1D_CACHE_ORDER;
+
+ /*
+ * This code is only executed when the flush mode is 'cond' or
+ * 'always'
+ */
+ if (static_branch_likely(&vmx_l1d_flush_cond)) {
+ bool flush_l1d;
+
+ /*
+ * Clear the per-vcpu flush bit, it gets set again if the vCPU
+ * is reloaded, i.e. if the vCPU is scheduled out or if KVM
+ * exits to userspace, or if KVM reaches one of the unsafe
+ * VMEXIT handlers, e.g. if KVM calls into the emulator.
+ */
+ flush_l1d = vcpu->arch.l1tf_flush_l1d;
+ vcpu->arch.l1tf_flush_l1d = false;
+
+ /*
+ * Clear the per-cpu flush bit, it gets set again from
+ * the interrupt handlers.
+ */
+ flush_l1d |= kvm_get_cpu_l1tf_flush_l1d();
+ kvm_clear_cpu_l1tf_flush_l1d();
+
+ if (!flush_l1d)
+ return false;
+ }
+
+ vcpu->stat.l1d_flush++;
+
+ if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
+ native_wrmsrq(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
+ return true;
+ }
+
+ asm volatile(
+ /* First ensure the pages are in the TLB */
+ "xorl %%eax, %%eax\n"
+ ".Lpopulate_tlb:\n\t"
+ "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
+ "addl $4096, %%eax\n\t"
+ "cmpl %%eax, %[size]\n\t"
+ "jne .Lpopulate_tlb\n\t"
+ "xorl %%eax, %%eax\n\t"
+ "cpuid\n\t"
+ /* Now fill the cache */
+ "xorl %%eax, %%eax\n"
+ ".Lfill_cache:\n"
+ "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
+ "addl $64, %%eax\n\t"
+ "cmpl %%eax, %[size]\n\t"
+ "jne .Lfill_cache\n\t"
+ "lfence\n"
+ :: [flush_pages] "r" (vmx_l1d_flush_pages),
+ [size] "r" (size)
+ : "eax", "ebx", "ecx", "edx");
+ return true;
+}
+
+static const struct kernel_param_ops vmentry_l1d_flush_ops = {
+ .set = vmentry_l1d_flush_set,
+ .get = vmentry_l1d_flush_get,
+};
+module_param_cb(vmentry_l1d_flush, &vmentry_l1d_flush_ops, NULL, 0644);
+
static __always_inline void vmx_disable_fb_clear(struct vcpu_vmx *vmx)
{
u64 msr;
@@ -404,12 +492,6 @@ static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
vmx->disable_fb_clear = false;
}
-static const struct kernel_param_ops vmentry_l1d_flush_ops = {
- .set = vmentry_l1d_flush_set,
- .get = vmentry_l1d_flush_get,
-};
-module_param_cb(vmentry_l1d_flush, &vmentry_l1d_flush_ops, NULL, 0644);
-
static u32 vmx_segment_access_rights(struct kvm_segment *var);
void vmx_vmexit(void);
@@ -6653,78 +6735,6 @@ int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
return ret;
}
-/*
- * Software based L1D cache flush which is used when microcode providing
- * the cache control MSR is not loaded.
- *
- * The L1D cache is 32 KiB on Nehalem and later microarchitectures, but to
- * flush it is required to read in 64 KiB because the replacement algorithm
- * is not exactly LRU. This could be sized at runtime via topology
- * information but as all relevant affected CPUs have 32KiB L1D cache size
- * there is no point in doing so.
- */
-static noinstr bool vmx_l1d_flush(struct kvm_vcpu *vcpu)
-{
- int size = PAGE_SIZE << L1D_CACHE_ORDER;
-
- /*
- * This code is only executed when the flush mode is 'cond' or
- * 'always'
- */
- if (static_branch_likely(&vmx_l1d_flush_cond)) {
- bool flush_l1d;
-
- /*
- * Clear the per-vcpu flush bit, it gets set again if the vCPU
- * is reloaded, i.e. if the vCPU is scheduled out or if KVM
- * exits to userspace, or if KVM reaches one of the unsafe
- * VMEXIT handlers, e.g. if KVM calls into the emulator.
- */
- flush_l1d = vcpu->arch.l1tf_flush_l1d;
- vcpu->arch.l1tf_flush_l1d = false;
-
- /*
- * Clear the per-cpu flush bit, it gets set again from
- * the interrupt handlers.
- */
- flush_l1d |= kvm_get_cpu_l1tf_flush_l1d();
- kvm_clear_cpu_l1tf_flush_l1d();
-
- if (!flush_l1d)
- return false;
- }
-
- vcpu->stat.l1d_flush++;
-
- if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
- native_wrmsrq(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
- return true;
- }
-
- asm volatile(
- /* First ensure the pages are in the TLB */
- "xorl %%eax, %%eax\n"
- ".Lpopulate_tlb:\n\t"
- "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
- "addl $4096, %%eax\n\t"
- "cmpl %%eax, %[size]\n\t"
- "jne .Lpopulate_tlb\n\t"
- "xorl %%eax, %%eax\n\t"
- "cpuid\n\t"
- /* Now fill the cache */
- "xorl %%eax, %%eax\n"
- ".Lfill_cache:\n"
- "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t"
- "addl $64, %%eax\n\t"
- "cmpl %%eax, %[size]\n\t"
- "jne .Lfill_cache\n\t"
- "lfence\n"
- :: [flush_pages] "r" (vmx_l1d_flush_pages),
- [size] "r" (size)
- : "eax", "ebx", "ecx", "edx");
- return true;
-}
-
void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
{
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
@@ -8673,16 +8683,6 @@ __init int vmx_hardware_setup(void)
return r;
}
-static void vmx_cleanup_l1d_flush(void)
-{
- if (vmx_l1d_flush_pages) {
- free_pages((unsigned long)vmx_l1d_flush_pages, L1D_CACHE_ORDER);
- vmx_l1d_flush_pages = NULL;
- }
- /* Restore state so sysfs ignores VMX */
- l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO;
-}
-
void vmx_exit(void)
{
allow_smaller_maxphyaddr = false;
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v3 3/4] KVM: VMX: Disable L1TF L1 data cache flush if CONFIG_CPU_MITIGATIONS=n
2025-10-16 20:04 [PATCH v3 0/4] KVM: VMX: Unify L1D flush for L1TF Sean Christopherson
2025-10-16 20:04 ` [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped Sean Christopherson
2025-10-16 20:04 ` [PATCH v3 2/4] KVM: VMX: Bundle all L1 data cache flush mitigation code together Sean Christopherson
@ 2025-10-16 20:04 ` Sean Christopherson
2025-10-22 1:36 ` Pawan Gupta
2025-10-16 20:04 ` [PATCH v3 4/4] KVM: x86: Unify L1TF flushing under per-CPU variable Sean Christopherson
3 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-10-16 20:04 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Pawan Gupta, Brendan Jackman
Disable support for flushing the L1 data cache to mitigate L1TF if CPU
mitigations are disabled for the entire kernel. KVM's mitigation of L1TF
is in no way special enough to justify ignoring CONFIG_CPU_MITIGATIONS=n.
Deliberately use CPU_MITIGATIONS instead of the more precise
MITIGATION_L1TF, as MITIGATION_L1TF only controls the default behavior,
i.e. CONFIG_MITIGATION_L1TF=n doesn't completely disable L1TF mitigations
in the kernel.
Keep the vmentry_l1d_flush module param to avoid breaking existing setups,
and leverage the .set path to alert the user to the fact that
vmentry_l1d_flush will be ignored. Don't bother validating the incoming
value; if an admin misconfigures vmentry_l1d_flush, the fact that the bad
configuration won't be detected when running with CONFIG_CPU_MITIGATIONS=n
is likely the least of their worries.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/hardirq.h | 4 +--
arch/x86/kvm/vmx/vmx.c | 56 ++++++++++++++++++++++++++--------
2 files changed, 46 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index f00c09ffe6a9..6b6d472baa0b 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -5,7 +5,7 @@
#include <linux/threads.h>
typedef struct {
-#if IS_ENABLED(CONFIG_KVM_INTEL)
+#if IS_ENABLED(CONFIG_CPU_MITIGATIONS) && IS_ENABLED(CONFIG_KVM_INTEL)
u8 kvm_cpu_l1tf_flush_l1d;
#endif
unsigned int __nmi_count; /* arch dependent */
@@ -68,7 +68,7 @@ extern u64 arch_irq_stat(void);
DECLARE_PER_CPU_CACHE_HOT(u16, __softirq_pending);
#define local_softirq_pending_ref __softirq_pending
-#if IS_ENABLED(CONFIG_KVM_INTEL)
+#if IS_ENABLED(CONFIG_CPU_MITIGATIONS) && IS_ENABLED(CONFIG_KVM_INTEL)
/*
* This function is called from noinstr interrupt contexts
* and must be inlined to not get instrumentation.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index cd8ae1b2ae55..e91d99211efe 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -203,6 +203,7 @@ module_param(pt_mode, int, S_IRUGO);
struct x86_pmu_lbr __ro_after_init vmx_lbr_caps;
+#ifdef CONFIG_CPU_MITIGATIONS
static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush);
static DEFINE_STATIC_KEY_FALSE(vmx_l1d_flush_cond);
static DEFINE_MUTEX(vmx_l1d_flush_mutex);
@@ -225,7 +226,7 @@ static const struct {
#define L1D_CACHE_ORDER 4
static void *vmx_l1d_flush_pages;
-static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
+static int __vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
{
struct page *page;
unsigned int i;
@@ -302,6 +303,16 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
return 0;
}
+static int vmx_setup_l1d_flush(void)
+{
+ /*
+ * Hand the parameter mitigation value in which was stored in the pre
+ * module init parser. If no parameter was given, it will contain
+ * 'auto' which will be turned into the default 'cond' mitigation mode.
+ */
+ return vmx_setup_l1d_flush(vmentry_l1d_flush_param);
+}
+
static void vmx_cleanup_l1d_flush(void)
{
if (vmx_l1d_flush_pages) {
@@ -349,7 +360,7 @@ static int vmentry_l1d_flush_set(const char *s, const struct kernel_param *kp)
}
mutex_lock(&vmx_l1d_flush_mutex);
- ret = vmx_setup_l1d_flush(l1tf);
+ ret = __vmx_setup_l1d_flush(l1tf);
mutex_unlock(&vmx_l1d_flush_mutex);
return ret;
}
@@ -376,6 +387,9 @@ static noinstr bool vmx_l1d_flush(struct kvm_vcpu *vcpu)
{
int size = PAGE_SIZE << L1D_CACHE_ORDER;
+ if (!static_branch_unlikely(&vmx_l1d_should_flush))
+ return false;
+
/*
* This code is only executed when the flush mode is 'cond' or
* 'always'
@@ -434,6 +448,31 @@ static noinstr bool vmx_l1d_flush(struct kvm_vcpu *vcpu)
return true;
}
+#else /* CONFIG_CPU_MITIGATIONS*/
+static int vmx_setup_l1d_flush(void)
+{
+ l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NEVER;
+ return 0;
+}
+static void vmx_cleanup_l1d_flush(void)
+{
+ l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO;
+}
+static __always_inline bool vmx_l1d_flush(struct kvm_vcpu *vcpu)
+{
+ return false;
+}
+static int vmentry_l1d_flush_set(const char *s, const struct kernel_param *kp)
+{
+ pr_warn_once("Kernel compiled without mitigations, ignoring vmentry_l1d_flush\n");
+ return 0;
+}
+static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp)
+{
+ return sysfs_emit(s, "never\n");
+}
+#endif
+
static const struct kernel_param_ops vmentry_l1d_flush_ops = {
.set = vmentry_l1d_flush_set,
.get = vmentry_l1d_flush_get,
@@ -7341,8 +7380,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
* and is affected by MMIO Stale Data. In such cases mitigation in only
* needed against an MMIO capable guest.
*/
- if (static_branch_unlikely(&vmx_l1d_should_flush) &&
- vmx_l1d_flush(vcpu))
+ if (vmx_l1d_flush(vcpu))
;
else if (static_branch_unlikely(&cpu_buf_vm_clear) &&
(flags & VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO))
@@ -8718,14 +8756,8 @@ int __init vmx_init(void)
if (r)
return r;
- /*
- * Must be called after common x86 init so enable_ept is properly set
- * up. Hand the parameter mitigation value in which was stored in
- * the pre module init parser. If no parameter was given, it will
- * contain 'auto' which will be turned into the default 'cond'
- * mitigation mode.
- */
- r = vmx_setup_l1d_flush(vmentry_l1d_flush_param);
+ /* Must be called after common x86 init so enable_ept is setup. */
+ r = vmx_setup_l1d_flush();
if (r)
goto err_l1d_flush;
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v3 4/4] KVM: x86: Unify L1TF flushing under per-CPU variable
2025-10-16 20:04 [PATCH v3 0/4] KVM: VMX: Unify L1D flush for L1TF Sean Christopherson
` (2 preceding siblings ...)
2025-10-16 20:04 ` [PATCH v3 3/4] KVM: VMX: Disable L1TF L1 data cache flush if CONFIG_CPU_MITIGATIONS=n Sean Christopherson
@ 2025-10-16 20:04 ` Sean Christopherson
2025-10-22 1:59 ` Pawan Gupta
3 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-10-16 20:04 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Pawan Gupta, Brendan Jackman
From: Brendan Jackman <jackmanb@google.com>
Currently the tracking of the need to flush L1D for L1TF is tracked by
two bits: one per-CPU and one per-vCPU.
The per-vCPU bit is always set when the vCPU shows up on a core, so
there is no interesting state that's truly per-vCPU. Indeed, this is a
requirement, since L1D is a part of the physical CPU.
So simplify this by combining the two bits.
The vCPU bit was being written from preemption-enabled regions. To play
nice with those cases, wrap all calls from KVM and use a raw write so that
request a flush with preemption enabled doesn't trigger what would
effectively be DEBUG_PREEMPT false positives. Preemption doesn't need to
be disabled, as kvm_arch_vcpu_load() will mark the new CPU as needing a
flush if the vCPU task is migrated, or if userspace runs the vCPU on a
different task.
Signed-off-by: Brendan Jackman <jackmanb@google.com>
[sean: put raw write in KVM instead of in a hardirq.h variant]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 3 ---
arch/x86/kvm/mmu/mmu.c | 2 +-
arch/x86/kvm/vmx/nested.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 20 +++++---------------
arch/x86/kvm/x86.c | 6 +++---
arch/x86/kvm/x86.h | 14 ++++++++++++++
6 files changed, 24 insertions(+), 23 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 48598d017d6f..fcdc65ab13d8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1055,9 +1055,6 @@ struct kvm_vcpu_arch {
/* be preempted when it's in kernel-mode(cpl=0) */
bool preempted_in_kernel;
- /* Flush the L1 Data cache for L1TF mitigation on VMENTER */
- bool l1tf_flush_l1d;
-
/* Host CPU on which VM-entry was most recently attempted */
int last_vmentry_cpu;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 18d69d48bc55..4e016582adc7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4859,7 +4859,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
*/
BUILD_BUG_ON(lower_32_bits(PFERR_SYNTHETIC_MASK));
- vcpu->arch.l1tf_flush_l1d = true;
+ kvm_request_l1tf_flush_l1d();
if (!flags) {
trace_kvm_page_fault(vcpu, fault_address, error_code);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 3fca63a261f5..468a013d9ef3 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3880,7 +3880,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
goto vmentry_failed;
/* Hide L1D cache contents from the nested guest. */
- vcpu->arch.l1tf_flush_l1d = true;
+ kvm_request_l1tf_flush_l1d();
/*
* Must happen outside of nested_vmx_enter_non_root_mode() as it will
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e91d99211efe..0347d321a86e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -395,26 +395,16 @@ static noinstr bool vmx_l1d_flush(struct kvm_vcpu *vcpu)
* 'always'
*/
if (static_branch_likely(&vmx_l1d_flush_cond)) {
- bool flush_l1d;
-
/*
- * Clear the per-vcpu flush bit, it gets set again if the vCPU
+ * Clear the per-cpu flush bit, it gets set again if the vCPU
* is reloaded, i.e. if the vCPU is scheduled out or if KVM
* exits to userspace, or if KVM reaches one of the unsafe
- * VMEXIT handlers, e.g. if KVM calls into the emulator.
+ * VMEXIT handlers, e.g. if KVM calls into the emulator,
+ * or from the interrupt handlers.
*/
- flush_l1d = vcpu->arch.l1tf_flush_l1d;
- vcpu->arch.l1tf_flush_l1d = false;
-
- /*
- * Clear the per-cpu flush bit, it gets set again from
- * the interrupt handlers.
- */
- flush_l1d |= kvm_get_cpu_l1tf_flush_l1d();
+ if (!kvm_get_cpu_l1tf_flush_l1d())
+ return;
kvm_clear_cpu_l1tf_flush_l1d();
-
- if (!flush_l1d)
- return false;
}
vcpu->stat.l1d_flush++;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b4b5d2d09634..851f078cd5ca 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5189,7 +5189,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
- vcpu->arch.l1tf_flush_l1d = true;
+ kvm_request_l1tf_flush_l1d();
if (vcpu->scheduled_out && pmu->version && pmu->event_count) {
pmu->need_cleanup = true;
@@ -7999,7 +7999,7 @@ int kvm_write_guest_virt_system(struct kvm_vcpu *vcpu, gva_t addr, void *val,
unsigned int bytes, struct x86_exception *exception)
{
/* kvm_write_guest_virt_system can pull in tons of pages. */
- vcpu->arch.l1tf_flush_l1d = true;
+ kvm_request_l1tf_flush_l1d();
return kvm_write_guest_virt_helper(addr, val, bytes, vcpu,
PFERR_WRITE_MASK, exception);
@@ -9395,7 +9395,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
return handle_emulation_failure(vcpu, emulation_type);
}
- vcpu->arch.l1tf_flush_l1d = true;
+ kvm_request_l1tf_flush_l1d();
if (!(emulation_type & EMULTYPE_NO_DECODE)) {
kvm_clear_exception_queue(vcpu);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index f3dc77f006f9..cd67ccbb747f 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -420,6 +420,20 @@ static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk)
return !(kvm->arch.disabled_quirks & quirk);
}
+static __always_inline void kvm_request_l1tf_flush_l1d(void)
+{
+#if IS_ENABLED(CONFIG_CPU_MITIGATIONS) && IS_ENABLED(CONFIG_KVM_INTEL)
+ /*
+ * Use a raw write to set the per-CPU flag, as KVM will ensure a flush
+ * even if preemption is currently enabled.. If the current vCPU task
+ * is migrated to a different CPU (or userspace runs the vCPU on a
+ * different task) before the next VM-Entry, then kvm_arch_vcpu_load()
+ * will request a flush on the new CPU.
+ */
+ raw_cpu_write(irq_stat.kvm_cpu_l1tf_flush_l1d, 1);
+#endif
+}
+
void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
u64 get_kvmclock_ns(struct kvm *kvm);
--
2.51.0.858.gf9c4a03a3a-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped
2025-10-16 20:04 ` [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped Sean Christopherson
@ 2025-10-21 13:34 ` Brendan Jackman
2025-10-21 16:48 ` Sean Christopherson
2025-10-21 23:18 ` Pawan Gupta
1 sibling, 1 reply; 22+ messages in thread
From: Brendan Jackman @ 2025-10-21 13:34 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Pawan Gupta, Brendan Jackman
On Thu Oct 16, 2025 at 8:04 PM UTC, Sean Christopherson wrote:
> If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to
> mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g.
> because none of the "heavy" paths that trigger an L1D flush were tripped
> since the last VM-Enter.
Presumably the assumption here was that the L1TF conditionality is good
enough for the MMIO stale data vuln too? I'm not qualified to assess if
that assumption is true, but also even if it's a good one it's
definitely not obvious to users that the mitigation you pick for L1TF
has this side-effect. So I think I'm on board with calling this a bug.
If anyone turns out to be depending on the current behaviour for
performance I think they should probably add it back as a separate flag.
> MDS mitigation was inadvertently fixed by commit 43fb862de8f6 ("KVM/VMX:
> Move VERW closer to VMentry for MDS mitigation"), but previous kernels
> that flush CPU buffers in vmx_vcpu_enter_exit() are affected.
>
> Fixes: 650b68a0622f ("x86/kvm/vmx: Add MDS protection when L1D Flush is not active")
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/vmx/vmx.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index f87c216d976d..ce556d5dc39b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6663,7 +6663,7 @@ int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
> * information but as all relevant affected CPUs have 32KiB L1D cache size
> * there is no point in doing so.
> */
> -static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
> +static noinstr bool vmx_l1d_flush(struct kvm_vcpu *vcpu)
> {
> int size = PAGE_SIZE << L1D_CACHE_ORDER;
>
> @@ -6691,14 +6691,14 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
> kvm_clear_cpu_l1tf_flush_l1d();
>
> if (!flush_l1d)
> - return;
> + return false;
> }
>
> vcpu->stat.l1d_flush++;
>
> if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
> native_wrmsrq(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
> - return;
> + return true;
> }
>
> asm volatile(
> @@ -6722,6 +6722,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
> :: [flush_pages] "r" (vmx_l1d_flush_pages),
> [size] "r" (size)
> : "eax", "ebx", "ecx", "edx");
> + return true;
The comment in the caller says the L1D flush "includes CPU buffer clear
to mitigate MDS" - do we actually know that this software sequence
mitigates the MMIO stale data vuln like the verw does? (Do we even know if
it mitigates MDS?)
Anyway, if this is an issue, it's orthogonal to this patch.
Reviewed-by: Brendan Jackman <jackmanb@google.com>
> }
>
> void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
> @@ -7330,8 +7331,9 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> * and is affected by MMIO Stale Data. In such cases mitigation in only
> * needed against an MMIO capable guest.
> */
> - if (static_branch_unlikely(&vmx_l1d_should_flush))
> - vmx_l1d_flush(vcpu);
> + if (static_branch_unlikely(&vmx_l1d_should_flush) &&
> + vmx_l1d_flush(vcpu))
> + ;
> else if (static_branch_unlikely(&cpu_buf_vm_clear) &&
> (flags & VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO))
> x86_clear_cpu_buffers();
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 2/4] KVM: VMX: Bundle all L1 data cache flush mitigation code together
2025-10-16 20:04 ` [PATCH v3 2/4] KVM: VMX: Bundle all L1 data cache flush mitigation code together Sean Christopherson
@ 2025-10-21 13:38 ` Brendan Jackman
0 siblings, 0 replies; 22+ messages in thread
From: Brendan Jackman @ 2025-10-21 13:38 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini
Cc: kvm, linux-kernel, Pawan Gupta, Brendan Jackman
On Thu Oct 16, 2025 at 8:04 PM UTC, Sean Christopherson wrote:
> Move vmx_l1d_flush(), vmx_cleanup_l1d_flush(), and the vmentry_l1d_flush
> param code up in vmx.c so that all of the L1 data cache flushing code is
> bundled together. This will allow conditioning the mitigation code on
> CONFIG_CPU_MITIGATIONS=y with minimal #ifdefs.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
(Git says no changed lines)
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped
2025-10-21 13:34 ` Brendan Jackman
@ 2025-10-21 16:48 ` Sean Christopherson
2025-10-21 23:30 ` Pawan Gupta
0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2025-10-21 16:48 UTC (permalink / raw)
To: Brendan Jackman; +Cc: Paolo Bonzini, kvm, linux-kernel, Pawan Gupta
On Tue, Oct 21, 2025, Brendan Jackman wrote:
> On Thu Oct 16, 2025 at 8:04 PM UTC, Sean Christopherson wrote:
> > If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to
> > mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g.
> > because none of the "heavy" paths that trigger an L1D flush were tripped
> > since the last VM-Enter.
>
> Presumably the assumption here was that the L1TF conditionality is good
> enough for the MMIO stale data vuln too? I'm not qualified to assess if
> that assumption is true, but also even if it's a good one it's
> definitely not obvious to users that the mitigation you pick for L1TF
> has this side-effect. So I think I'm on board with calling this a bug.
Yeah, that's where I'm at as well.
> If anyone turns out to be depending on the current behaviour for
> performance I think they should probably add it back as a separate flag.
...
> > @@ -6722,6 +6722,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
> > :: [flush_pages] "r" (vmx_l1d_flush_pages),
> > [size] "r" (size)
> > : "eax", "ebx", "ecx", "edx");
> > + return true;
>
> The comment in the caller says the L1D flush "includes CPU buffer clear
> to mitigate MDS" - do we actually know that this software sequence
> mitigates the MMIO stale data vuln like the verw does? (Do we even know if
> it mitigates MDS?)
>
> Anyway, if this is an issue, it's orthogonal to this patch.
Pawan, any idea?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped
2025-10-16 20:04 ` [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped Sean Christopherson
2025-10-21 13:34 ` Brendan Jackman
@ 2025-10-21 23:18 ` Pawan Gupta
2025-10-22 1:59 ` Brendan Jackman
1 sibling, 1 reply; 22+ messages in thread
From: Pawan Gupta @ 2025-10-21 23:18 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Brendan Jackman
On Thu, Oct 16, 2025 at 01:04:14PM -0700, Sean Christopherson wrote:
> If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to
> mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g.
> because none of the "heavy" paths that trigger an L1D flush were tripped
> since the last VM-Enter.
>
> Note, the flaw goes back to the introduction of the MDS mitigation.
I don't think it is a flaw. If L1D flush was skipped because VMexit did not
touch any interested data, then there shouldn't be any need to flush CPU
buffers.
Secondly, when L1D flush is skipped, flushing MDS affected buffers is of no
use, because the data could still be extracted from L1D cache using L1TF.
Isn't it?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped
2025-10-21 16:48 ` Sean Christopherson
@ 2025-10-21 23:30 ` Pawan Gupta
2025-10-22 1:20 ` Pawan Gupta
2025-10-27 21:09 ` Pawan Gupta
0 siblings, 2 replies; 22+ messages in thread
From: Pawan Gupta @ 2025-10-21 23:30 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Brendan Jackman, Paolo Bonzini, kvm, linux-kernel
On Tue, Oct 21, 2025 at 09:48:30AM -0700, Sean Christopherson wrote:
> On Tue, Oct 21, 2025, Brendan Jackman wrote:
> > On Thu Oct 16, 2025 at 8:04 PM UTC, Sean Christopherson wrote:
> > > If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to
> > > mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g.
> > > because none of the "heavy" paths that trigger an L1D flush were tripped
> > > since the last VM-Enter.
> >
> > Presumably the assumption here was that the L1TF conditionality is good
> > enough for the MMIO stale data vuln too? I'm not qualified to assess if
> > that assumption is true, but also even if it's a good one it's
> > definitely not obvious to users that the mitigation you pick for L1TF
> > has this side-effect. So I think I'm on board with calling this a bug.
>
> Yeah, that's where I'm at as well.
>
> > If anyone turns out to be depending on the current behaviour for
> > performance I think they should probably add it back as a separate flag.
>
> ...
>
> > > @@ -6722,6 +6722,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
> > > :: [flush_pages] "r" (vmx_l1d_flush_pages),
> > > [size] "r" (size)
> > > : "eax", "ebx", "ecx", "edx");
> > > + return true;
> >
> > The comment in the caller says the L1D flush "includes CPU buffer clear
> > to mitigate MDS" - do we actually know that this software sequence
> > mitigates the MMIO stale data vuln like the verw does? (Do we even know if
> > it mitigates MDS?)
> >
> > Anyway, if this is an issue, it's orthogonal to this patch.
>
> Pawan, any idea?
I want to say yes, but let me first confirm this internally and get back to
you.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped
2025-10-21 23:30 ` Pawan Gupta
@ 2025-10-22 1:20 ` Pawan Gupta
2025-10-27 22:03 ` Jim Mattson
2025-10-27 21:09 ` Pawan Gupta
1 sibling, 1 reply; 22+ messages in thread
From: Pawan Gupta @ 2025-10-22 1:20 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Brendan Jackman, Paolo Bonzini, kvm, linux-kernel
On Tue, Oct 21, 2025 at 04:30:19PM -0700, Pawan Gupta wrote:
> On Tue, Oct 21, 2025 at 09:48:30AM -0700, Sean Christopherson wrote:
> > On Tue, Oct 21, 2025, Brendan Jackman wrote:
> > > On Thu Oct 16, 2025 at 8:04 PM UTC, Sean Christopherson wrote:
> > > > If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to
> > > > mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g.
> > > > because none of the "heavy" paths that trigger an L1D flush were tripped
> > > > since the last VM-Enter.
> > >
> > > Presumably the assumption here was that the L1TF conditionality is good
> > > enough for the MMIO stale data vuln too? I'm not qualified to assess if
> > > that assumption is true, but also even if it's a good one it's
> > > definitely not obvious to users that the mitigation you pick for L1TF
> > > has this side-effect. So I think I'm on board with calling this a bug.
> >
> > Yeah, that's where I'm at as well.
> >
> > > If anyone turns out to be depending on the current behaviour for
> > > performance I think they should probably add it back as a separate flag.
> >
> > ...
> >
> > > > @@ -6722,6 +6722,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
> > > > :: [flush_pages] "r" (vmx_l1d_flush_pages),
> > > > [size] "r" (size)
> > > > : "eax", "ebx", "ecx", "edx");
> > > > + return true;
> > >
> > > The comment in the caller says the L1D flush "includes CPU buffer clear
> > > to mitigate MDS" - do we actually know that this software sequence
> > > mitigates the MMIO stale data vuln like the verw does? (Do we even know if
> > > it mitigates MDS?)
Thinking more on this, the software sequence is only invoked when the
system doesn't have the L1D flushing feature added by a microcode update.
In such a case system is not expected to have a flushing VERW either, which
was introduced after L1TF. Also, the admin needs to have a very good reason
for not updating the microcode for 5+ years :-)
Anyways, I have asked for a confirmation if the sequence works for MMIO
stale data also. I will update once I get a response.
> > > Anyway, if this is an issue, it's orthogonal to this patch.
> >
> > Pawan, any idea?
>
> I want to say yes, but let me first confirm this internally and get back to
> you.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 3/4] KVM: VMX: Disable L1TF L1 data cache flush if CONFIG_CPU_MITIGATIONS=n
2025-10-16 20:04 ` [PATCH v3 3/4] KVM: VMX: Disable L1TF L1 data cache flush if CONFIG_CPU_MITIGATIONS=n Sean Christopherson
@ 2025-10-22 1:36 ` Pawan Gupta
2025-10-22 15:06 ` Sean Christopherson
0 siblings, 1 reply; 22+ messages in thread
From: Pawan Gupta @ 2025-10-22 1:36 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Brendan Jackman
On Thu, Oct 16, 2025 at 01:04:16PM -0700, Sean Christopherson wrote:
> @@ -302,6 +303,16 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
> return 0;
> }
>
> +static int vmx_setup_l1d_flush(void)
> +{
> + /*
> + * Hand the parameter mitigation value in which was stored in the pre
> + * module init parser. If no parameter was given, it will contain
> + * 'auto' which will be turned into the default 'cond' mitigation mode.
> + */
> + return vmx_setup_l1d_flush(vmentry_l1d_flush_param);
A likely typo here, it should be:
return __vmx_setup_l1d_flush(vmentry_l1d_flush_param);
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 4/4] KVM: x86: Unify L1TF flushing under per-CPU variable
2025-10-16 20:04 ` [PATCH v3 4/4] KVM: x86: Unify L1TF flushing under per-CPU variable Sean Christopherson
@ 2025-10-22 1:59 ` Pawan Gupta
0 siblings, 0 replies; 22+ messages in thread
From: Pawan Gupta @ 2025-10-22 1:59 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Brendan Jackman
On Thu, Oct 16, 2025 at 01:04:17PM -0700, Sean Christopherson wrote:
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -395,26 +395,16 @@ static noinstr bool vmx_l1d_flush(struct kvm_vcpu *vcpu)
> * 'always'
> */
> if (static_branch_likely(&vmx_l1d_flush_cond)) {
> - bool flush_l1d;
> -
> /*
> - * Clear the per-vcpu flush bit, it gets set again if the vCPU
> + * Clear the per-cpu flush bit, it gets set again if the vCPU
> * is reloaded, i.e. if the vCPU is scheduled out or if KVM
> * exits to userspace, or if KVM reaches one of the unsafe
> - * VMEXIT handlers, e.g. if KVM calls into the emulator.
> + * VMEXIT handlers, e.g. if KVM calls into the emulator,
> + * or from the interrupt handlers.
> */
> - flush_l1d = vcpu->arch.l1tf_flush_l1d;
> - vcpu->arch.l1tf_flush_l1d = false;
> -
> - /*
> - * Clear the per-cpu flush bit, it gets set again from
> - * the interrupt handlers.
> - */
> - flush_l1d |= kvm_get_cpu_l1tf_flush_l1d();
> + if (!kvm_get_cpu_l1tf_flush_l1d())
> + return;
This should be returning false here.
> kvm_clear_cpu_l1tf_flush_l1d();
> -
> - if (!flush_l1d)
> - return false;
> }
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped
2025-10-21 23:18 ` Pawan Gupta
@ 2025-10-22 1:59 ` Brendan Jackman
2025-10-22 15:04 ` Sean Christopherson
0 siblings, 1 reply; 22+ messages in thread
From: Brendan Jackman @ 2025-10-22 1:59 UTC (permalink / raw)
To: Pawan Gupta, Sean Christopherson
Cc: Paolo Bonzini, kvm, linux-kernel, Brendan Jackman
On Tue Oct 21, 2025 at 11:18 PM UTC, Pawan Gupta wrote:
> On Thu, Oct 16, 2025 at 01:04:14PM -0700, Sean Christopherson wrote:
>> If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to
>> mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g.
>> because none of the "heavy" paths that trigger an L1D flush were tripped
>> since the last VM-Enter.
>>
>> Note, the flaw goes back to the introduction of the MDS mitigation.
>
> I don't think it is a flaw. If L1D flush was skipped because VMexit did not
> touch any interested data, then there shouldn't be any need to flush CPU
> buffers.
>
> Secondly, when L1D flush is skipped, flushing MDS affected buffers is of no
> use, because the data could still be extracted from L1D cache using L1TF.
> Isn't it?
This is assuming an equivalence between what L1TF and MMIO Stale Data
exploits can do, that isn't really captured in the code/documentation
IMO. This probably felt much more obvious when the vulns were new...
I dunno, in the end this definitely doesn't seem like a terrifying big
deal, I'm not saying the current behaviour is crazy or anything, it's
just slightly surprising and people with sophisticated opinions about
this might not be getting what they think they are out of the default
setup.
But I have no evidence that these sophisticated dissidents actually
exist, maybe just adding commentary about this rationale is more than
good enough here.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped
2025-10-22 1:59 ` Brendan Jackman
@ 2025-10-22 15:04 ` Sean Christopherson
0 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-10-22 15:04 UTC (permalink / raw)
To: Brendan Jackman; +Cc: Pawan Gupta, Paolo Bonzini, kvm, linux-kernel
On Wed, Oct 22, 2025, Brendan Jackman wrote:
> On Tue Oct 21, 2025 at 11:18 PM UTC, Pawan Gupta wrote:
> > On Thu, Oct 16, 2025 at 01:04:14PM -0700, Sean Christopherson wrote:
> >> If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to
> >> mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g.
> >> because none of the "heavy" paths that trigger an L1D flush were tripped
> >> since the last VM-Enter.
> >>
> >> Note, the flaw goes back to the introduction of the MDS mitigation.
> >
> > I don't think it is a flaw. If L1D flush was skipped because VMexit did not
> > touch any interested data, then there shouldn't be any need to flush CPU
> > buffers.
But as Brendan alludes to below, that assumes certain aspects of L1TF and MDS are
equal. Obliterating the L1D is far more costly than flushing CPU buffers, as
evidenced by the much more conditional flushing for L1TF. My read of the L1TF
mitigation is that the conditional flushing is that it's a compromise between
performance and security. Skipping the flush doesn't necessarily mean nothing
interesting was accessed, it just means that KVM didn't hit any of the flows
where a large amount of interesting data was guaranteed to have been accessed.
> > Secondly, when L1D flush is skipped, flushing MDS affected buffers is of no
> > use, because the data could still be extracted from L1D cache using L1TF.
> > Isn't it?
>
> This is assuming an equivalence between what L1TF and MMIO Stale Data
> exploits can do, that isn't really captured in the code/documentation
> IMO.
And again, the cost. To fully mitigate L1TF, KVM would need to flush on every
entry, but that completely tanks performance. But that doesn't
> This probably felt much more obvious when the vulns were new...
>
> I dunno, in the end this definitely doesn't seem like a terrifying big
> deal, I'm not saying the current behaviour is crazy or anything, it's
> just slightly surprising and people with sophisticated opinions about
> this might not be getting what they think they are out of the default
> setup.
Ya. I highly doubt this particular combination matters in practice, but I don't
like surprises. And I find it surprising that the behavior of KVM's mitigation
for MMIO Stale Data changes based on whether or not the L1TF mitigation is enabled.
> But I have no evidence that these sophisticated dissidents actually
> exist, maybe just adding commentary about this rationale is more than
> good enough here.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 3/4] KVM: VMX: Disable L1TF L1 data cache flush if CONFIG_CPU_MITIGATIONS=n
2025-10-22 1:36 ` Pawan Gupta
@ 2025-10-22 15:06 ` Sean Christopherson
0 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2025-10-22 15:06 UTC (permalink / raw)
To: Pawan Gupta; +Cc: Paolo Bonzini, kvm, linux-kernel, Brendan Jackman
On Tue, Oct 21, 2025, Pawan Gupta wrote:
> On Thu, Oct 16, 2025 at 01:04:16PM -0700, Sean Christopherson wrote:
> > @@ -302,6 +303,16 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
> > return 0;
> > }
> >
> > +static int vmx_setup_l1d_flush(void)
> > +{
> > + /*
> > + * Hand the parameter mitigation value in which was stored in the pre
> > + * module init parser. If no parameter was given, it will contain
> > + * 'auto' which will be turned into the default 'cond' mitigation mode.
> > + */
> > + return vmx_setup_l1d_flush(vmentry_l1d_flush_param);
>
> A likely typo here, it should be:
>
> return __vmx_setup_l1d_flush(vmentry_l1d_flush_param);
Argh, I have a feeling I clobbered my branch with a --force push, as I remember
fixing this exact problem. Or maybe I saw Brendan's struggles and thought, "hold
my beer!" :-D
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped
2025-10-21 23:30 ` Pawan Gupta
2025-10-22 1:20 ` Pawan Gupta
@ 2025-10-27 21:09 ` Pawan Gupta
1 sibling, 0 replies; 22+ messages in thread
From: Pawan Gupta @ 2025-10-27 21:09 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Brendan Jackman, Paolo Bonzini, kvm, linux-kernel
On Tue, Oct 21, 2025 at 04:30:12PM -0700, Pawan Gupta wrote:
> On Tue, Oct 21, 2025 at 09:48:30AM -0700, Sean Christopherson wrote:
> > On Tue, Oct 21, 2025, Brendan Jackman wrote:
> > > On Thu Oct 16, 2025 at 8:04 PM UTC, Sean Christopherson wrote:
> > > > If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to
> > > > mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g.
> > > > because none of the "heavy" paths that trigger an L1D flush were tripped
> > > > since the last VM-Enter.
> > >
> > > Presumably the assumption here was that the L1TF conditionality is good
> > > enough for the MMIO stale data vuln too? I'm not qualified to assess if
> > > that assumption is true, but also even if it's a good one it's
> > > definitely not obvious to users that the mitigation you pick for L1TF
> > > has this side-effect. So I think I'm on board with calling this a bug.
> >
> > Yeah, that's where I'm at as well.
> >
> > > If anyone turns out to be depending on the current behaviour for
> > > performance I think they should probably add it back as a separate flag.
> >
> > ...
> >
> > > > @@ -6722,6 +6722,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
> > > > :: [flush_pages] "r" (vmx_l1d_flush_pages),
> > > > [size] "r" (size)
> > > > : "eax", "ebx", "ecx", "edx");
> > > > + return true;
> > >
> > > The comment in the caller says the L1D flush "includes CPU buffer clear
> > > to mitigate MDS" - do we actually know that this software sequence
> > > mitigates the MMIO stale data vuln like the verw does? (Do we even know if
> > > it mitigates MDS?)
> > >
> > > Anyway, if this is an issue, it's orthogonal to this patch.
> >
> > Pawan, any idea?
>
> I want to say yes, but let me first confirm this internally and get back to
> you.
The software sequence for L1D flush was not validated to mitigate MMIO
Stale Data. To be on safer side, it is better to not rely on the sequence.
OTOH, if a user has not updated the microcode to mitigate L1TF, the system
will not have the microcode to mitigate MMIO Stale Data either, because the
microcode for MMIO Stale Data was released after L1TF. Also I am not aware
of any CPUs that are vulnerable to L1TF and vulnerable to MMIO Stale Data
only(not MDS).
So in practice, decoupling L1D flush and MMIO Stale Data won't have any
practical impact on functionality, and makes MMIO Stale Data mitigation
consistent with MDS mitigation. I hope that makes things clear.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped
2025-10-22 1:20 ` Pawan Gupta
@ 2025-10-27 22:03 ` Jim Mattson
2025-10-27 23:17 ` Pawan Gupta
0 siblings, 1 reply; 22+ messages in thread
From: Jim Mattson @ 2025-10-27 22:03 UTC (permalink / raw)
To: Pawan Gupta
Cc: Sean Christopherson, Brendan Jackman, Paolo Bonzini, kvm,
linux-kernel
On Tue, Oct 21, 2025 at 6:20 PM Pawan Gupta
<pawan.kumar.gupta@linux.intel.com> wrote:
>
> ...
> Thinking more on this, the software sequence is only invoked when the
> system doesn't have the L1D flushing feature added by a microcode update.
> In such a case system is not expected to have a flushing VERW either, which
> was introduced after L1TF. Also, the admin needs to have a very good reason
> for not updating the microcode for 5+ years :-)
KVM started reporting MD_CLEAR to userspace in Linux v5.2, but it
didn't report L1D_FLUSH to userspace until Linux v6.4, so there are
plenty of virtual CPUs with a flushing VERW that don't have the L1D
flushing feature.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped
2025-10-27 22:03 ` Jim Mattson
@ 2025-10-27 23:17 ` Pawan Gupta
2025-10-27 23:58 ` Jim Mattson
0 siblings, 1 reply; 22+ messages in thread
From: Pawan Gupta @ 2025-10-27 23:17 UTC (permalink / raw)
To: Jim Mattson
Cc: Sean Christopherson, Brendan Jackman, Paolo Bonzini, kvm,
linux-kernel
On Mon, Oct 27, 2025 at 03:03:23PM -0700, Jim Mattson wrote:
> On Tue, Oct 21, 2025 at 6:20 PM Pawan Gupta
> <pawan.kumar.gupta@linux.intel.com> wrote:
> >
> > ...
> > Thinking more on this, the software sequence is only invoked when the
> > system doesn't have the L1D flushing feature added by a microcode update.
> > In such a case system is not expected to have a flushing VERW either, which
> > was introduced after L1TF. Also, the admin needs to have a very good reason
> > for not updating the microcode for 5+ years :-)
>
> KVM started reporting MD_CLEAR to userspace in Linux v5.2, but it
> didn't report L1D_FLUSH to userspace until Linux v6.4, so there are
> plenty of virtual CPUs with a flushing VERW that don't have the L1D
> flushing feature.
Shouldn't only the L0 hypervisor be doing the L1D_FLUSH?
kvm_get_arch_capabilities()
{
...
/*
* If we're doing cache flushes (either "always" or "cond")
* we will do one whenever the guest does a vmlaunch/vmresume.
* If an outer hypervisor is doing the cache flush for us
* (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
* capability to the guest too, and if EPT is disabled we're not
* vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will
* require a nested hypervisor to do a flush of its own.
*/
if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped
2025-10-27 23:17 ` Pawan Gupta
@ 2025-10-27 23:58 ` Jim Mattson
2025-10-28 0:19 ` Pawan Gupta
0 siblings, 1 reply; 22+ messages in thread
From: Jim Mattson @ 2025-10-27 23:58 UTC (permalink / raw)
To: Pawan Gupta
Cc: Sean Christopherson, Brendan Jackman, Paolo Bonzini, kvm,
linux-kernel
On Mon, Oct 27, 2025 at 4:17 PM Pawan Gupta
<pawan.kumar.gupta@linux.intel.com> wrote:
>
> On Mon, Oct 27, 2025 at 03:03:23PM -0700, Jim Mattson wrote:
> > On Tue, Oct 21, 2025 at 6:20 PM Pawan Gupta
> > <pawan.kumar.gupta@linux.intel.com> wrote:
> > >
> > > ...
> > > Thinking more on this, the software sequence is only invoked when the
> > > system doesn't have the L1D flushing feature added by a microcode update.
> > > In such a case system is not expected to have a flushing VERW either, which
> > > was introduced after L1TF. Also, the admin needs to have a very good reason
> > > for not updating the microcode for 5+ years :-)
> >
> > KVM started reporting MD_CLEAR to userspace in Linux v5.2, but it
> > didn't report L1D_FLUSH to userspace until Linux v6.4, so there are
> > plenty of virtual CPUs with a flushing VERW that don't have the L1D
> > flushing feature.
>
> Shouldn't only the L0 hypervisor be doing the L1D_FLUSH?
>
> kvm_get_arch_capabilities()
> {
> ...
> /*
> * If we're doing cache flushes (either "always" or "cond")
> * we will do one whenever the guest does a vmlaunch/vmresume.
> * If an outer hypervisor is doing the cache flush for us
> * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
> * capability to the guest too, and if EPT is disabled we're not
> * vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will
> * require a nested hypervisor to do a flush of its own.
> */
> if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
> data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
>
Unless L0 has chosen L1D_FLUSH_NEVER. :)
On GCE's L1TF-vulnerable hosts, we actually do an L1D flush at ASI
entry rather than VM-entry. ASI entries are two orders of magnitude
less frequent than VM-entries, so we get comparable protection to
L1D_FLUSH_ALWAYS at a fraction of the cost.
At the moment, we still do an L1D flush on emulated VM-entry, but
that's just because we have historically advertised
IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY to L1.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped
2025-10-27 23:58 ` Jim Mattson
@ 2025-10-28 0:19 ` Pawan Gupta
2025-10-28 0:49 ` Pawan Gupta
0 siblings, 1 reply; 22+ messages in thread
From: Pawan Gupta @ 2025-10-28 0:19 UTC (permalink / raw)
To: Jim Mattson
Cc: Sean Christopherson, Brendan Jackman, Paolo Bonzini, kvm,
linux-kernel
On Mon, Oct 27, 2025 at 04:58:10PM -0700, Jim Mattson wrote:
> On Mon, Oct 27, 2025 at 4:17 PM Pawan Gupta
> <pawan.kumar.gupta@linux.intel.com> wrote:
> >
> > On Mon, Oct 27, 2025 at 03:03:23PM -0700, Jim Mattson wrote:
> > > On Tue, Oct 21, 2025 at 6:20 PM Pawan Gupta
> > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > >
> > > > ...
> > > > Thinking more on this, the software sequence is only invoked when the
> > > > system doesn't have the L1D flushing feature added by a microcode update.
> > > > In such a case system is not expected to have a flushing VERW either, which
> > > > was introduced after L1TF. Also, the admin needs to have a very good reason
> > > > for not updating the microcode for 5+ years :-)
> > >
> > > KVM started reporting MD_CLEAR to userspace in Linux v5.2, but it
> > > didn't report L1D_FLUSH to userspace until Linux v6.4, so there are
> > > plenty of virtual CPUs with a flushing VERW that don't have the L1D
> > > flushing feature.
> >
> > Shouldn't only the L0 hypervisor be doing the L1D_FLUSH?
> >
> > kvm_get_arch_capabilities()
> > {
> > ...
> > /*
> > * If we're doing cache flushes (either "always" or "cond")
> > * we will do one whenever the guest does a vmlaunch/vmresume.
> > * If an outer hypervisor is doing the cache flush for us
> > * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
> > * capability to the guest too, and if EPT is disabled we're not
> > * vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will
> > * require a nested hypervisor to do a flush of its own.
> > */
> > if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
> > data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
> >
>
> Unless L0 has chosen L1D_FLUSH_NEVER. :)
>
> On GCE's L1TF-vulnerable hosts, we actually do an L1D flush at ASI
> entry rather than VM-entry. ASI entries are two orders of magnitude
> less frequent than VM-entries, so we get comparable protection to
> L1D_FLUSH_ALWAYS at a fraction of the cost.
>
> At the moment, we still do an L1D flush on emulated VM-entry, but
> that's just because we have historically advertised
> IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY to L1.
Thanks for the background.
I still don't see the problem, CPUs that are vulnerable to L1TF are also
vulnerable to MDS. So, they don't set mmio_stale_data_clear, instead they
set X86_FEATURE_CLEAR_CPU_BUF and execute VERW in __vmx_vcpu_run()
regardless of whether L1D_FLUSH was done.
But, I agree it is best to decouple L1D flush and MMIO Stale Data to be
avoid any confusion.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped
2025-10-28 0:19 ` Pawan Gupta
@ 2025-10-28 0:49 ` Pawan Gupta
0 siblings, 0 replies; 22+ messages in thread
From: Pawan Gupta @ 2025-10-28 0:49 UTC (permalink / raw)
To: Jim Mattson
Cc: Sean Christopherson, Brendan Jackman, Paolo Bonzini, kvm,
linux-kernel
On Mon, Oct 27, 2025 at 05:19:57PM -0700, Pawan Gupta wrote:
> On Mon, Oct 27, 2025 at 04:58:10PM -0700, Jim Mattson wrote:
> > On Mon, Oct 27, 2025 at 4:17 PM Pawan Gupta
> > <pawan.kumar.gupta@linux.intel.com> wrote:
> > >
> > > On Mon, Oct 27, 2025 at 03:03:23PM -0700, Jim Mattson wrote:
> > > > On Tue, Oct 21, 2025 at 6:20 PM Pawan Gupta
> > > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > > >
> > > > > ...
> > > > > Thinking more on this, the software sequence is only invoked when the
> > > > > system doesn't have the L1D flushing feature added by a microcode update.
> > > > > In such a case system is not expected to have a flushing VERW either, which
> > > > > was introduced after L1TF. Also, the admin needs to have a very good reason
> > > > > for not updating the microcode for 5+ years :-)
> > > >
> > > > KVM started reporting MD_CLEAR to userspace in Linux v5.2, but it
> > > > didn't report L1D_FLUSH to userspace until Linux v6.4, so there are
> > > > plenty of virtual CPUs with a flushing VERW that don't have the L1D
> > > > flushing feature.
> > >
> > > Shouldn't only the L0 hypervisor be doing the L1D_FLUSH?
> > >
> > > kvm_get_arch_capabilities()
> > > {
> > > ...
> > > /*
> > > * If we're doing cache flushes (either "always" or "cond")
> > > * we will do one whenever the guest does a vmlaunch/vmresume.
> > > * If an outer hypervisor is doing the cache flush for us
> > > * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
> > > * capability to the guest too, and if EPT is disabled we're not
> > > * vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will
> > > * require a nested hypervisor to do a flush of its own.
> > > */
> > > if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
> > > data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
> > >
> >
> > Unless L0 has chosen L1D_FLUSH_NEVER. :)
> >
> > On GCE's L1TF-vulnerable hosts, we actually do an L1D flush at ASI
> > entry rather than VM-entry. ASI entries are two orders of magnitude
> > less frequent than VM-entries, so we get comparable protection to
> > L1D_FLUSH_ALWAYS at a fraction of the cost.
> >
> > At the moment, we still do an L1D flush on emulated VM-entry, but
> > that's just because we have historically advertised
> > IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY to L1.
>
> Thanks for the background.
>
> I still don't see the problem, CPUs that are vulnerable to L1TF are also
> vulnerable to MDS. So, they don't set mmio_stale_data_clear, instead they
Sorry I meant cpu_buf_vm_clear instead of mmio_stale_data_clear (I was
looking at a slightly older kernel).
> set X86_FEATURE_CLEAR_CPU_BUF and execute VERW in __vmx_vcpu_run()
> regardless of whether L1D_FLUSH was done.
>
> But, I agree it is best to decouple L1D flush and MMIO Stale Data to be
> avoid any confusion.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-10-28 0:49 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-16 20:04 [PATCH v3 0/4] KVM: VMX: Unify L1D flush for L1TF Sean Christopherson
2025-10-16 20:04 ` [PATCH v3 1/4] KVM: VMX: Flush CPU buffers as needed if L1D cache flush is skipped Sean Christopherson
2025-10-21 13:34 ` Brendan Jackman
2025-10-21 16:48 ` Sean Christopherson
2025-10-21 23:30 ` Pawan Gupta
2025-10-22 1:20 ` Pawan Gupta
2025-10-27 22:03 ` Jim Mattson
2025-10-27 23:17 ` Pawan Gupta
2025-10-27 23:58 ` Jim Mattson
2025-10-28 0:19 ` Pawan Gupta
2025-10-28 0:49 ` Pawan Gupta
2025-10-27 21:09 ` Pawan Gupta
2025-10-21 23:18 ` Pawan Gupta
2025-10-22 1:59 ` Brendan Jackman
2025-10-22 15:04 ` Sean Christopherson
2025-10-16 20:04 ` [PATCH v3 2/4] KVM: VMX: Bundle all L1 data cache flush mitigation code together Sean Christopherson
2025-10-21 13:38 ` Brendan Jackman
2025-10-16 20:04 ` [PATCH v3 3/4] KVM: VMX: Disable L1TF L1 data cache flush if CONFIG_CPU_MITIGATIONS=n Sean Christopherson
2025-10-22 1:36 ` Pawan Gupta
2025-10-22 15:06 ` Sean Christopherson
2025-10-16 20:04 ` [PATCH v3 4/4] KVM: x86: Unify L1TF flushing under per-CPU variable Sean Christopherson
2025-10-22 1:59 ` Pawan Gupta
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox