All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/25] Optimize nSVM TLB flushes
@ 2026-06-16  0:41 Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 01/25] KVM: nSVM: Flush the TLB after forcefully leaving nested Yosry Ahmed
                   ` (24 more replies)
  0 siblings, 25 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

A second RFC series (or third) for optimizing TLB flushes for nested
SVM, over a year later. This series optimizes TLB flushes by using and
maintaining a separate ASID for L2 for each vCPU (similar to VMX's
handling of VPIDs), rather than sharing the same ASID and flushing
everything on every nested transition.

The series also drops the dynamic ASID allocation scheme and uses a
static ASID for vCPU (or two if using nested), brining it further closer
to how VMX handles VPIDs. Modern CPUs (checked Milan, Genoa, and Turin)
advertise 32K ASIDs, so there is no point.

The series is mostly in a good shape, but it is an RFC because I only
did basic testing (booted an L2 and an L3, run netperf in L2). I am
mainly sending this out to get some feedback on the code while I do more
extensive testing. Ideally I would test this with a memory overcommitted
VM in L0 (to trigger KVM unmapping and flushing guest memory) and a CPU
overcommitted VM in L1 (to trigger L1 KVM multiplexing multiple L2 ASIDs
on the same L1 vCPU) -- let's see if AI can help with this.

Admittedly, the SEV and HV bits only got build tested, and I doubt that
I would be able to do more than this going forward.

Since over a year passed and the code changed non-trivially, I dropped
all Reviewed-by tags (sorry Maxim!), even on patches that are relatively
the same. I won't go too much into the details for what changed because
over a year has passed, but I will describe the main changes.

Breakdown:
- Patch 1 is a bug fix that can be taken separately.
- Patches 2-9 implement the change to use static ASIDs per vCPU for
  SVM, and can also be taken separately if needed.
- The rest of the patches (up to patch 24) add an L2 ASID and proper
  handling for it, leading to ultimately dropping the unconditional
  flushes on nested transitions.
- Patch 25 is purely for testing convenience. It adds a module param
  that adds back the unconditional flushes. Although if we are really
  paranoid we can keep it for a while? Idk if we usually do that.

RFC v1 -> RFC v2:
- Changed from a static ASID per VM to a static ASID per vCPU, dropping
  the needed per-CPU ASID -> vCPU tracking and dropping all SEV changes.
- Fixed a missing flush in the current code when forcefully leaving
  nested (patch 1).
- Expose more ASIDs to L1 (patch 2), otherwise the usefulness of this
  series is limited when L1 runs a bunch of L2s.
- Drop passing is_guest_mode around (past me, EW!).
- Reuse more of kvm_mmu_invlpg() for INVLPGA emulation.
- Fix INVLPGA emulation in case L1 runs on a different CPU than L2 (but
  L2 never switches its CPU).
- Resync the nested NPT when L1 changes L2's ASID (missed in RFC v1).
- Shuffled some patches around, combined/split patches differently, and
  rewrote some changlogs.

RFC v1: https://lore.kernel.org/lkml/20250326193619.3714986-1-yosry.ahmed@linux.dev/


Yosry Ahmed (25):
  KVM: nSVM: Flush the TLB after forcefully leaving nested
  KVM: SVM: Passthrough the number of supported ASIDs
  KVM: VMX: Generalize VPID allocation to be vendor-neutral
  KVM: x86/mmu: Support specifying a minimum TLB tag
  KVM: SVM: Add helpers to set/clear ASID flush in VMCB
  KVM: SVM: Fallback to flush everything if FLUSHBYASID is not available
  KVM: SVM: Duplicate pre-run ASID check for SEV and non-SEV guests
  KVM: SEV: Stop using per-vCPU ASID for SEV VMs
  KVM: SVM: Use a static ASID per vCPU
  KVM: nSVM: Add a placeholder ASID for L2
  KVM: x86: hyper-v: Rename kvm_hv_vcpu_purge_flush_tlb()
  KVM: x86: hyper-v: Allow puring all TLB flush FIFOs
  KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH
  KVM: nSVM: Move svm_switch_vmcb() to nested.c
  KVM: nSVM: Call nested_svm_transition_tlb_flush() on every VMCB switch
  KVM: nSVM: Split nested_svm_transition_tlb_flush() into entry/exit fns
  KVM: nSVM: Service local TLB flushes before nested transitions
  KVM: nSVM: Handle nested TLB flush requests through TLB_CONTROL
  KVM: nSVM: Flush the TLB if L1 changes L2's ASID in vmcb12
  KVM: nSVM: Do not reset TLB_CONTROL in vmcb02 on nested VM-Enter
  KVM: x86/mmu: rename __kvm_mmu_invalidate_addr()
  KVM: x86/mmu: Refactor kvm_mmu_invlpg() to allow skipping the gva
    flush
  KVM: nSVM: Flush L2's ASID when emulating INVLPGA
  KVM: nSVM: Use different ASIDs for L1 and L2
  DO NOT MERGE: Add nested_tlb_force_flush

 arch/x86/include/asm/kvm_host.h |   3 +
 arch/x86/kvm/cpuid.c            |   2 -
 arch/x86/kvm/hyperv.h           |  33 ++++++--
 arch/x86/kvm/mmu.h              |   8 ++
 arch/x86/kvm/mmu/mmu.c          | 106 +++++++++++++++++++++---
 arch/x86/kvm/svm/nested.c       | 106 ++++++++++++++++++------
 arch/x86/kvm/svm/sev.c          |  12 ++-
 arch/x86/kvm/svm/svm.c          | 137 ++++++++++++++++++++------------
 arch/x86/kvm/svm/svm.h          |  41 +++++++---
 arch/x86/kvm/vmx/vmx.c          |  40 +++-------
 arch/x86/kvm/vmx/vmx.h          |  29 ++++++-
 arch/x86/kvm/x86.c              |   2 +-
 12 files changed, 377 insertions(+), 142 deletions(-)


base-commit: c1f7303302927f9cbf4efedf70f0512cde168c65
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 01/25] KVM: nSVM: Flush the TLB after forcefully leaving nested
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 02/25] KVM: SVM: Passthrough the number of supported ASIDs Yosry Ahmed
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed, stable

KVM flushes the TLB on nested VM-Enter and nested VM-Exit, but not when
forcefully leaving nested. In this case, L2 TLB entries can leak into
L1. Flush the TLB after forcefully exiting L2, similar to nested
VM-Exits.

Note that vmx_leave_nested() handles this correctly, as it reuses
nested_vmx_vmexit(), which handles the necessary TLB flushes on an L2 ->
L1 transition.

Cc: stable@vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 1ab8b95975a4b..c85aa5cf670f4 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1550,6 +1550,8 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
 
 		svm_switch_vmcb(svm, &svm->vmcb01);
 
+		nested_svm_transition_tlb_flush(vcpu);
+
 		nested_svm_uninit_mmu_context(vcpu);
 		vmcb_mark_all_dirty(svm->vmcb);
 
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 02/25] KVM: SVM: Passthrough the number of supported ASIDs
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 01/25] KVM: nSVM: Flush the TLB after forcefully leaving nested Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 03/25] KVM: VMX: Generalize VPID allocation to be vendor-neutral Yosry Ahmed
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

KVM currently hardcodes the number of supported ASIDs in CPUID to 8. A
KVM guest (L1) would then do a full TLB flush (i.e.
TLB_CONTROL_FLUSH_ALL_ASID) every time it runs out of ASIDs on a vCPU
and updates the generation (see new_asid()).

This is currently harmless, as KVM (L0) uses the same ASID for both L1
and L2, and flushes that ASID on nested transitions. However, following
changes will add proper ASID emulation and a separate ASID for L2,
minimizing the TLB flushes on nested transitions. At that point, a full
TLB flush from a KVM guest (L1) would flush both L1 and L2 ASIDs, so
should be avoided as much as possible.

Passthrough the number of ASIDs in hardware instead of hardcoding 8, to
reduce the chances of an L1 guest flushing its own TLB entries
unnecessarily on a nested VMRUN.

In practice, there is no harm in exposing a large number of ASIDs to the
guest, even larger than what hardware supports, as KVM never actually
uses the value of ASID from vmcb12. Even with a separate L2 ASID, KVM
would allocate a (supported) ASID for L2, and just flush that same ASID
every time L1 changes the ASID in vmcb12.

That being said, avoid the temptation of just advertising the maximum
possible number of ASIDs (i.e. 0xFFFFFFFF), in case any peculiar guest
OS does not handle that properly.

Note: QEMU currently hardcodes the number of ASIDs to 16, so this change
doesn't help QEMU VMs (without making a similar change in QEMU).

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/cpuid.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 591d2294acd75..4486fc8d22b04 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1824,8 +1824,6 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 			break;
 		}
 		entry->eax = 1; /* SVM revision 1 */
-		entry->ebx = 8; /* Lets support 8 ASIDs in case we add proper
-				   ASID emulation to nested SVM */
 		entry->ecx = 0; /* Reserved */
 		cpuid_entry_override(entry, CPUID_8000_000A_EDX);
 		break;
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 03/25] KVM: VMX: Generalize VPID allocation to be vendor-neutral
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 01/25] KVM: nSVM: Flush the TLB after forcefully leaving nested Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 02/25] KVM: SVM: Passthrough the number of supported ASIDs Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 04/25] KVM: x86/mmu: Support specifying a minimum TLB tag Yosry Ahmed
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

In preparation for sharing with SVM, generalize the VMX VPID allocation
code and move it to common code as a TLB tags allocator. Parameterize
the TLB tags allocator by the number of tags, and allocate the bitmap
dynamically. Opportunisitcally use guards to acquire the lock instead of
spin_{lock/unlock}().

The number of tags includes tag=0, which is not usable. The interface is
a little confusing in that regard, but this will be changed with the
introducing of a minimum tag later.

Initialize the TLB tags allocator during hardware setup/unsetup, and
reserve tag=0 during initialziation, similar to how VPID=0 is currently
reserved in the VMX-specific bitmap during hardware setup.

Keep allocate_vpid() and free_vpid() as wrapper that check enable_vpid
to avoid checking at all callsites, and add init_vpids() and
destroy_vpids() to wrap init/destroy calls as well.

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/mmu.h     |  8 ++++++
 arch/x86/kvm/mmu/mmu.c | 64 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c | 40 ++++++--------------------
 arch/x86/kvm/vmx/vmx.h | 28 +++++++++++++++---
 4 files changed, 105 insertions(+), 35 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index e1bb663ebbd58..9a2916012cbff 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -334,4 +334,12 @@ static inline bool kvm_is_gfn_alias(struct kvm *kvm, gfn_t gfn)
 {
 	return gfn & kvm_gfn_direct_bits(kvm);
 }
+
+typedef unsigned int kvm_tlb_tag_t;
+
+int kvm_init_tlb_tags(unsigned int nr);
+void kvm_destroy_tlb_tags(void);
+kvm_tlb_tag_t kvm_alloc_tlb_tag(void);
+void kvm_free_tlb_tag(kvm_tlb_tag_t tag);
+
 #endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9368a71336fe4..e021ed562502f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -8192,4 +8192,68 @@ void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
 		}
 	}
 }
+
+static struct {
+	spinlock_t	lock;
+	unsigned long	*bitmap;
+	unsigned int	nr;
+} tlb_tags;
+
+int kvm_init_tlb_tags(unsigned int nr)
+{
+	if (WARN_ON_ONCE(!nr))
+		return -EINVAL;
+
+	tlb_tags.bitmap = bitmap_zalloc(nr, GFP_KERNEL);
+	if (!tlb_tags.bitmap)
+		return -ENOMEM;
+
+	/*
+	 * 0 is the host's TLB tag for both VMX's VPID and SVM's ASID, and is
+	 * returned on failed allocations (e.g. no more tags left).
+	 */
+	__set_bit(0, tlb_tags.bitmap);
+
+	tlb_tags.nr = nr;
+	spin_lock_init(&tlb_tags.lock);
+	return 0;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_init_tlb_tags);
+
+void kvm_destroy_tlb_tags(void)
+{
+	bitmap_free(tlb_tags.bitmap);
+	tlb_tags.bitmap = NULL;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_destroy_tlb_tags);
+
+kvm_tlb_tag_t kvm_alloc_tlb_tag(void)
+{
+	kvm_tlb_tag_t tag;
+
+	if (WARN_ON_ONCE(!tlb_tags.bitmap))
+		return 0;
+
+	guard(spinlock)(&tlb_tags.lock);
+
+	tag = find_first_zero_bit(tlb_tags.bitmap, tlb_tags.nr);
+	if (tag >= tlb_tags.nr)
+		return 0;
+
+	__set_bit(tag, tlb_tags.bitmap);
+	return tag;
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_alloc_tlb_tag);
+
+void kvm_free_tlb_tag(kvm_tlb_tag_t tag)
+{
+	if (!tag || WARN_ON_ONCE(tag >= tlb_tags.nr))
+		return;
+
+	guard(spinlock)(&tlb_tags.lock);
+
+	__clear_bit(tag, tlb_tags.bitmap);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_free_tlb_tag);
+
 #endif
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c548f22375ad6..e1fd1c95ee8cc 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -594,9 +594,6 @@ DEFINE_PER_CPU(struct vmcs *, current_vmcs);
  */
 static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
 
-static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
-static DEFINE_SPINLOCK(vmx_vpid_lock);
-
 struct vmcs_config vmcs_config __ro_after_init;
 struct vmx_capability vmx_capability __ro_after_init;
 
@@ -4067,31 +4064,6 @@ static void seg_setup(int seg)
 	vmcs_write32(sf->ar_bytes, ar);
 }
 
-int allocate_vpid(void)
-{
-	int vpid;
-
-	if (!enable_vpid)
-		return 0;
-	spin_lock(&vmx_vpid_lock);
-	vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
-	if (vpid < VMX_NR_VPIDS)
-		__set_bit(vpid, vmx_vpid_bitmap);
-	else
-		vpid = 0;
-	spin_unlock(&vmx_vpid_lock);
-	return vpid;
-}
-
-void free_vpid(int vpid)
-{
-	if (!enable_vpid || vpid == 0)
-		return;
-	spin_lock(&vmx_vpid_lock);
-	__clear_bit(vpid, vmx_vpid_bitmap);
-	spin_unlock(&vmx_vpid_lock);
-}
-
 static void vmx_msr_bitmap_l01_changed(struct vcpu_vmx *vmx)
 {
 	/*
@@ -8474,6 +8446,8 @@ void vmx_hardware_unsetup(void)
 
 	if (nested)
 		nested_vmx_hardware_unsetup();
+
+	destroy_vpids();
 }
 
 void vmx_vm_destroy(struct kvm *kvm)
@@ -8699,8 +8673,6 @@ __init int vmx_hardware_setup(void)
 	kvm_caps.has_bus_lock_exit = cpu_has_vmx_bus_lock_detection();
 	kvm_caps.has_notify_vmexit = cpu_has_notify_vmexit();
 
-	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
-
 	if (enable_ept)
 		kvm_mmu_set_ept_masks(enable_ept_ad_bits);
 	else
@@ -8765,6 +8737,10 @@ __init int vmx_hardware_setup(void)
 
 	vmx_set_cpu_caps();
 
+	r = init_vpids();
+	if (r)
+		return r;
+
 	/*
 	 * Configure nested capabilities after core CPU capabilities so that
 	 * nested support can be conditional on base support, e.g. so that KVM
@@ -8772,8 +8748,10 @@ __init int vmx_hardware_setup(void)
 	 */
 	if (nested) {
 		r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
-		if (r)
+		if (r) {
+			destroy_vpids();
 			return r;
+		}
 	}
 
 	kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index de9de0d2016ca..d6d35637d94f8 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -175,7 +175,7 @@ struct nested_vmx {
 	u64 pre_vmenter_ssp;
 	u64 pre_vmenter_ssp_tbl;
 
-	u16 vpid02;
+	kvm_tlb_tag_t vpid02;
 	u16 last_vpid;
 
 	int tsc_autostore_slot;
@@ -249,7 +249,7 @@ struct vcpu_vmx {
 			u32 ar;
 		} seg[8];
 	} segment_cache;
-	int vpid;
+	kvm_tlb_tag_t vpid;
 
 	/* Support for a guest hypervisor (nested VMX) */
 	struct nested_vmx nested;
@@ -334,9 +334,29 @@ static __always_inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu)
 	return vt->exit_intr_info;
 }
 
+static __always_inline int init_vpids(void)
+{
+	return enable_vpid ? kvm_init_tlb_tags(VMX_NR_VPIDS) : 0;
+}
+
+static __always_inline void destroy_vpids(void)
+{
+	if (enable_vpid)
+		kvm_destroy_tlb_tags();
+}
+
+static __always_inline kvm_tlb_tag_t allocate_vpid(void)
+{
+	return enable_vpid ? kvm_alloc_tlb_tag() : 0;
+}
+
+static __always_inline void free_vpid(kvm_tlb_tag_t vpid)
+{
+	if (enable_vpid)
+		kvm_free_tlb_tag(vpid);
+}
+
 void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu);
-int allocate_vpid(void);
-void free_vpid(int vpid);
 void vmx_set_constant_host_state(struct vcpu_vmx *vmx);
 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 04/25] KVM: x86/mmu: Support specifying a minimum TLB tag
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (2 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 03/25] KVM: VMX: Generalize VPID allocation to be vendor-neutral Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 05/25] KVM: SVM: Add helpers to set/clear ASID flush in VMCB Yosry Ahmed
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

In preparation for using the TLB tags allocator for SVM, which has a
range of ASIDs allocated for SEV/SNP, pass in a minimum TLB tag when
initializing the TLB tags allocator. The bitmap is conceptually shifted
such that bit=0 corresponds to tag=min.

Specifying the minimum value during initialization also makes the API
clearer, as the passed number of tags becomes the actual number of
*usable* tags, and tag=0 is explicitly excluded by the caller.

No functional change intended for VMX as VPID=0 is not used anyway.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/mmu.h     |  2 +-
 arch/x86/kvm/mmu/mmu.c | 38 +++++++++++++++++++++++---------------
 arch/x86/kvm/vmx/vmx.h |  3 ++-
 3 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 9a2916012cbff..cfffee92b8b71 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -337,7 +337,7 @@ static inline bool kvm_is_gfn_alias(struct kvm *kvm, gfn_t gfn)
 
 typedef unsigned int kvm_tlb_tag_t;
 
-int kvm_init_tlb_tags(unsigned int nr);
+int kvm_init_tlb_tags(kvm_tlb_tag_t min, unsigned int nr);
 void kvm_destroy_tlb_tags(void);
 kvm_tlb_tag_t kvm_alloc_tlb_tag(void);
 void kvm_free_tlb_tag(kvm_tlb_tag_t tag);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e021ed562502f..bf2e0c2205631 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -8197,24 +8197,26 @@ static struct {
 	spinlock_t	lock;
 	unsigned long	*bitmap;
 	unsigned int	nr;
+	kvm_tlb_tag_t	min;
 } tlb_tags;
 
-int kvm_init_tlb_tags(unsigned int nr)
+int kvm_init_tlb_tags(kvm_tlb_tag_t min, unsigned int nr)
 {
-	if (WARN_ON_ONCE(!nr))
-		return -EINVAL;
-
-	tlb_tags.bitmap = bitmap_zalloc(nr, GFP_KERNEL);
-	if (!tlb_tags.bitmap)
-		return -ENOMEM;
+	unsigned int end;
 
 	/*
 	 * 0 is the host's TLB tag for both VMX's VPID and SVM's ASID, and is
 	 * returned on failed allocations (e.g. no more tags left).
 	 */
-	__set_bit(0, tlb_tags.bitmap);
+	if (WARN_ON_ONCE(!min || !nr || check_add_overflow(min, nr, &end)))
+		return -EINVAL;
+
+	tlb_tags.bitmap = bitmap_zalloc(nr, GFP_KERNEL);
+	if (!tlb_tags.bitmap)
+		return -ENOMEM;
 
 	tlb_tags.nr = nr;
+	tlb_tags.min = min;
 	spin_lock_init(&tlb_tags.lock);
 	return 0;
 }
@@ -8229,30 +8231,36 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_destroy_tlb_tags);
 
 kvm_tlb_tag_t kvm_alloc_tlb_tag(void)
 {
-	kvm_tlb_tag_t tag;
+	unsigned int bit;
 
 	if (WARN_ON_ONCE(!tlb_tags.bitmap))
 		return 0;
 
 	guard(spinlock)(&tlb_tags.lock);
 
-	tag = find_first_zero_bit(tlb_tags.bitmap, tlb_tags.nr);
-	if (tag >= tlb_tags.nr)
+	bit = find_first_zero_bit(tlb_tags.bitmap, tlb_tags.nr);
+	if (bit >= tlb_tags.nr)
 		return 0;
 
-	__set_bit(tag, tlb_tags.bitmap);
-	return tag;
+	__set_bit(bit, tlb_tags.bitmap);
+	return tlb_tags.min + bit;
 }
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_alloc_tlb_tag);
 
 void kvm_free_tlb_tag(kvm_tlb_tag_t tag)
 {
-	if (!tag || WARN_ON_ONCE(tag >= tlb_tags.nr))
+	unsigned int bit;
+
+	if (!tag || WARN_ON_ONCE(tag < tlb_tags.min))
+		return;
+
+	bit = tag - tlb_tags.min;
+	if (WARN_ON_ONCE(bit >= tlb_tags.nr))
 		return;
 
 	guard(spinlock)(&tlb_tags.lock);
 
-	__clear_bit(tag, tlb_tags.bitmap);
+	__clear_bit(bit, tlb_tags.bitmap);
 }
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_free_tlb_tag);
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index d6d35637d94f8..0ddfe9626c126 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -336,7 +336,8 @@ static __always_inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu)
 
 static __always_inline int init_vpids(void)
 {
-	return enable_vpid ? kvm_init_tlb_tags(VMX_NR_VPIDS) : 0;
+	/* Exclude VPID=0 as it is used for the host */
+	return enable_vpid ? kvm_init_tlb_tags(1, VMX_NR_VPIDS - 1) : 0;
 }
 
 static __always_inline void destroy_vpids(void)
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 05/25] KVM: SVM: Add helpers to set/clear ASID flush in VMCB
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (3 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 04/25] KVM: x86/mmu: Support specifying a minimum TLB tag Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 06/25] KVM: SVM: Fallback to flush everything if FLUSHBYASID is not available Yosry Ahmed
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

Introduce set/clear helpers to set tlb_ctl to
TLB_CONTROL_FLUSH_ASID or TLB_CONTROL_DO_NOTHING. Incoming changes will
eliminate the use of TLB_CONTROL_FLUSH_ALL_ASID except as a fallback
when FLUSHBYASID is not available, so the helpers will create a common
path for this.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c |  2 +-
 arch/x86/kvm/svm/sev.c    |  2 +-
 arch/x86/kvm/svm/svm.c    |  4 ++--
 arch/x86/kvm/svm/svm.h    | 10 ++++++++++
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index c85aa5cf670f4..f891299d278a0 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -928,7 +928,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	/* Done at vmrun: asid.  */
 
 	/* Also overwritten later if necessary.  */
-	vmcb02->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
+	vmcb_clr_flush_asid(vmcb02);
 
 	/* Use vmcb01 MMU and format if guest does not use nNPT */
 	if (nested_npt_enabled(svm)) {
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 74fb15551e83f..026b070522dac 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3569,7 +3569,7 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
 		return 0;
 
 	sd->sev_vmcbs[asid] = svm->vmcb;
-	svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
+	vmcb_set_flush_asid(svm->vmcb);
 	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
 	return 0;
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 526e0fdcd16be..a7cc3cb101e57 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4186,7 +4186,7 @@ static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
 	 * VM-Exit (via kvm_mmu_reset_context()).
 	 */
 	if (static_cpu_has(X86_FEATURE_FLUSHBYASID))
-		svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
+		vmcb_set_flush_asid(svm->vmcb);
 	else
 		svm->current_vmcb->asid_generation--;
 }
@@ -4599,7 +4599,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 		vcpu->arch.nested_run_pending = 0;
 	}
 
-	svm->vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
+	vmcb_clr_flush_asid(svm->vmcb);
 
 	/*
 	 * Unconditionally mask off the CLEAR_RAP bit, the AND is just as cheap
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 716be21fba335..5b4613b4d7bc2 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -476,6 +476,16 @@ static inline void vmcb_set_gpat(struct vmcb *vmcb, u64 data)
 	vmcb_mark_dirty(vmcb, VMCB_NPT);
 }
 
+static inline void vmcb_set_flush_asid(struct vmcb *vmcb)
+{
+	vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
+}
+
+static inline void vmcb_clr_flush_asid(struct vmcb *vmcb)
+{
+	vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
+}
+
 static __always_inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
 {
 	return container_of(vcpu, struct vcpu_svm, vcpu);
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 06/25] KVM: SVM: Fallback to flush everything if FLUSHBYASID is not available
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (4 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 05/25] KVM: SVM: Add helpers to set/clear ASID flush in VMCB Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 07/25] KVM: SVM: Duplicate pre-run ASID check for SEV and non-SEV guests Yosry Ahmed
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

Currently, if FLUSHBYASID is not available when performing a TLB flush,
the fallback is decrementing the ASID generation to trigger allocating a
new ASID. In preparation for using a static ASID per vCPU (like VMX),
just fallback to flushing everything if FLUSHBYASID is not available.
This is probably worse from a performance perspective, but FLUSHBYASID
has been around for ~15 years and it's not worth carrying the
complexity.

The fallback logic is moved within vmcb_set_flush_asid(), as more
callers will be added and will need the fallback as well. The only other
current caller is from SEV code, and no CPUs support SEV but not
FLUSHBYASID (at least according to AI).

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/svm.c | 5 +----
 arch/x86/kvm/svm/svm.h | 5 ++++-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a7cc3cb101e57..159abf841738f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4185,10 +4185,7 @@ static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
 	 * unconditionally does a TLB flush on both nested VM-Enter and nested
 	 * VM-Exit (via kvm_mmu_reset_context()).
 	 */
-	if (static_cpu_has(X86_FEATURE_FLUSHBYASID))
-		vmcb_set_flush_asid(svm->vmcb);
-	else
-		svm->current_vmcb->asid_generation--;
+	vmcb_set_flush_asid(svm->vmcb);
 }
 
 static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5b4613b4d7bc2..4bf8afdc77cbd 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -478,7 +478,10 @@ static inline void vmcb_set_gpat(struct vmcb *vmcb, u64 data)
 
 static inline void vmcb_set_flush_asid(struct vmcb *vmcb)
 {
-	vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
+	if (static_cpu_has(X86_FEATURE_FLUSHBYASID))
+		vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
+	else
+		vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ALL_ASID;
 }
 
 static inline void vmcb_clr_flush_asid(struct vmcb *vmcb)
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 07/25] KVM: SVM: Duplicate pre-run ASID check for SEV and non-SEV guests
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (5 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 06/25] KVM: SVM: Fallback to flush everything if FLUSHBYASID is not available Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 08/25] KVM: SEV: Stop using per-vCPU ASID for SEV VMs Yosry Ahmed
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

In preparation for dropping the check for non-SEV VMs, duplicate the
check between pre_sev_run() and pre_svm_run() (for the non-SEV code
path).

Opportunistically drop the unconditional dirtying of VMCB_ASID, as it's
only needed when the ASID is actually updated in the VMCB, not when
TLB_CONTROL is updated (as the code currently reads).

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/sev.c | 6 +++++-
 arch/x86/kvm/svm/svm.c | 9 +++++----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 026b070522dac..3c873c88894b3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3558,6 +3558,11 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
 	/* Assign the asid allocated with this SEV guest */
 	svm->asid = asid;
 
+	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
+		svm->vmcb->control.asid = asid;
+		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
+	}
+
 	/*
 	 * Flush guest TLB:
 	 *
@@ -3570,7 +3575,6 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
 
 	sd->sev_vmcbs[asid] = svm->vmcb;
 	vmcb_set_flush_asid(svm->vmcb);
-	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
 	return 0;
 }
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 159abf841738f..5d4c45d788b54 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3766,6 +3766,11 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
 	if (svm->current_vmcb->asid_generation != sd->asid_generation)
 		new_asid(svm, sd);
 
+	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
+		svm->vmcb->control.asid = svm->asid;
+		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
+	}
+
 	return 0;
 }
 
@@ -4502,10 +4507,6 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 
 	sync_lapic_to_cr8(vcpu);
 
-	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
-		svm->vmcb->control.asid = svm->asid;
-		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
-	}
 	svm->vmcb->save.cr2 = vcpu->arch.cr2;
 
 	if (guest_cpu_cap_has(vcpu, X86_FEATURE_ERAPS) &&
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 08/25] KVM: SEV: Stop using per-vCPU ASID for SEV VMs
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (6 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 07/25] KVM: SVM: Duplicate pre-run ASID check for SEV and non-SEV guests Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  1:06   ` sashiko-bot
  2026-06-16  0:41 ` [RFC PATCH v2 09/25] KVM: SVM: Use a static ASID per vCPU Yosry Ahmed
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

svm->asid is only used by SEV in the pre-run check, to update the ASID
in the VMCB. Otherwise, svm->asid is only used for non-SEV VMs to keep
track of the per-vCPU ASID, as SEV VMs use a per-VM fixed ASID instead.

Initialize the ASID in the VMCB one time in sev_init_vmcb() instead of
checking it on every vCPU run, and change the pre-run check to a stopgap
with a WARNING.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/sev.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3c873c88894b3..f5173cde76173 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3555,10 +3555,7 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
 	if (!cpumask_test_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus))
 		cpumask_set_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus);
 
-	/* Assign the asid allocated with this SEV guest */
-	svm->asid = asid;
-
-	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
+	if (WARN_ON_ONCE(asid != svm->vmcb->control.asid)) {
 		svm->vmcb->control.asid = asid;
 		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
 	}
@@ -4763,6 +4760,9 @@ void sev_init_vmcb(struct vcpu_svm *svm, bool init_event)
 	svm->vmcb->control.misc_ctl |= SVM_MISC_ENABLE_SEV;
 	clr_exception_intercept(svm, UD_VECTOR);
 
+	svm->vmcb->control.asid = sev_get_asid(vcpu->kvm);
+	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
+
 	/*
 	 * Don't intercept #GP for SEV guests, e.g. for the VMware backdoor, as
 	 * KVM can't decrypt guest memory to decode the faulting instruction.
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 09/25] KVM: SVM: Use a static ASID per vCPU
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (7 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 08/25] KVM: SEV: Stop using per-vCPU ASID for SEV VMs Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  1:08   ` sashiko-bot
  2026-06-16  0:41 ` [RFC PATCH v2 10/25] KVM: nSVM: Add a placeholder ASID for L2 Yosry Ahmed
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

Switch from dynamic ASID allocation to a static per-vCPU ASID.  The ASID
dynamic ASID allocation logic is now only effectively used when running
out of ASIDs on a CPU (uncommon on modern hardware), or switching CPUs.

The per-CPU ASID generation is initialized to 1, and the per-VMCB ASID
generation is initialized to 0. This leads to a TLB flush of the ASID on
the first VMRUN, allocating a new ASID, and pumping the per-VMCB
generation to 1 (matching the per-CPU generation). The ASID remains
static until either:
- The vCPU runs on a new CPU, in which case KVM resets the per-VMCB
  generation to allocate a new ASID on the new CPU.
- KVM hits the maximum ASID on a CPU and increments the per-CPU ASID
  generation, at which point all vCPUs on this CPU will allocate a new
  ASID.

Drop the complexity and make the ASID static for each vCPU.  This makes
SVM's handling of ASIDs closer to VMX's handling of VPIDs, and
simplifies the code. It also completely avoids full TLB flushes (i.e.
TLB_CONTROL_FLUSH_ALL_ASID) on systems with FLUSHBYASID.  Full flushes
previously happened when updating the generation, so not so common, but
should generally be avoided as they cause a VMRUN to invalidate the TLB
entries for other VMs as well as the host.

When a vCPU is migrated to a new physical CPU, flush the (now static)
ASID instead of allocating a new one. This might cause extra TLB flushes
when switching CPUs (compared to just using a new ASID), but odds are
that the TLB is cold on the new CPU anyway.

As using ASIDs cannot be disabled like VPIDs, allocate a fallback ASID
to be shared by all vCPUs after running out of ASIDs (and flushed on all
VMRUNs), in the very unlikely scenario that more than 32K vCPUs are
active at the same time (as ASIDs are recycled when vCPUs are deleted).
Add allocate_asid() and free_asid() wrappers to handle the fallback
ASID logic, as it will be reused for nested in following changes.

Note #1, svm->asid can technically be dropped in favor of directly using
svm->vmcb->control.asid. However, that would require allocating the ASID
in init_vmcb(), and there is no corresponding cleanup function to free
the ASID. That would also require avoiding reallocation of a new ASID
during a vCPU reset. Opt for simplicity and keep svm->asid, which is
allocated and free in the vCPU creation/freeing paths like VMX VPIDs.

Note #2, a nice side-effect is reading the min/max ASIDs once during
initialization, instead of once per-CPU.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c |  4 +--
 arch/x86/kvm/svm/svm.c    | 68 +++++++++++++++++++++------------------
 arch/x86/kvm/svm/svm.h    | 22 +++++++++----
 3 files changed, 53 insertions(+), 41 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index f891299d278a0..66c5b5131cbb1 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -925,9 +925,9 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	else
 		vmcb02->control.bus_lock_counter = 0;
 
-	/* Done at vmrun: asid.  */
+	vmcb02->control.asid = vmcb01->control.asid;
 
-	/* Also overwritten later if necessary.  */
+	/* Overwritten later if necessary.  */
 	vmcb_clr_flush_asid(vmcb02);
 
 	/* Use vmcb01 MMU and format if guest does not use nNPT */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5d4c45d788b54..fae5cb7102010 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -189,6 +189,8 @@ DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
 
 static DEFINE_MUTEX(vmcb_dump_mutex);
 
+kvm_tlb_tag_t fallback_asid;
+
 /*
  * Only MSR_TSC_AUX is switched via the user return hook.  EFER is switched via
  * the VMCB, and the SYSCALL/SYSENTER MSRs are handled by VMLOAD/VMSAVE.
@@ -571,10 +573,6 @@ static int svm_enable_virtualization_cpu(void)
 		return r;
 
 	sd = per_cpu_ptr(&svm_data, me);
-	sd->asid_generation = 1;
-	sd->max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1;
-	sd->next_asid = sd->max_asid + 1;
-	sd->min_asid = max_sev_asid + 1;
 
 	wrmsrq(MSR_VM_HSAVE_PA, sd->save_area_pa);
 
@@ -977,6 +975,8 @@ static void svm_hardware_unsetup(void)
 
 	__free_pages(__sme_pa_to_page(iopm_base), get_order(IOPM_SIZE));
 	iopm_base = 0;
+
+	kvm_destroy_tlb_tags();
 }
 
 static void init_seg(struct vmcb_seg *seg)
@@ -1228,8 +1228,8 @@ static void init_vmcb(struct kvm_vcpu *vcpu, bool init_event)
 	if (gmet_enabled)
 		control->misc_ctl |= SVM_MISC_ENABLE_GMET;
 
-	svm->current_vmcb->asid_generation = 0;
-	svm->asid = 0;
+	control->asid = svm->asid;
+	vmcb_set_flush_asid(vmcb);
 
 	svm->nested.vmcb12_gpa = INVALID_GPA;
 	svm->nested.last_vmcb12_gpa = INVALID_GPA;
@@ -1339,6 +1339,15 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 		goto error_free_sev;
 	}
 
+	/*
+	 * svm->asid is unused by SEV, put zero in there to trigger VMRUN
+	 * failure if this ever ends up in the VMCB.
+	 */
+	if (is_sev_guest(vcpu))
+		svm->asid = 0;
+	else
+		svm->asid = allocate_asid();
+
 	svm->x2avic_msrs_intercepted = true;
 	svm->lbr_msrs_intercepted = true;
 
@@ -1371,6 +1380,9 @@ static void svm_vcpu_free(struct kvm_vcpu *vcpu)
 
 	__free_page(__sme_pa_to_page(svm->vmcb01.pa));
 	svm_vcpu_free_msrpm(svm->msrpm);
+
+	if (!is_sev_guest(vcpu))
+		free_asid(svm->asid);
 }
 
 #ifdef CONFIG_CPU_MITIGATIONS
@@ -1896,19 +1908,6 @@ static void svm_update_exception_bitmap(struct kvm_vcpu *vcpu)
 	}
 }
 
-static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd)
-{
-	if (sd->next_asid > sd->max_asid) {
-		++sd->asid_generation;
-		sd->next_asid = sd->min_asid;
-		svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ALL_ASID;
-		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
-	}
-
-	svm->current_vmcb->asid_generation = sd->asid_generation;
-	svm->asid = sd->next_asid++;
-}
-
 static void svm_set_dr6(struct kvm_vcpu *vcpu, unsigned long value)
 {
 	struct vmcb *vmcb = to_svm(vcpu)->vmcb;
@@ -3745,16 +3744,15 @@ static void svm_set_nested_run_soft_int_state(struct kvm_vcpu *vcpu)
 
 static int pre_svm_run(struct kvm_vcpu *vcpu)
 {
-	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, vcpu->cpu);
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	/*
-	 * If the previous vmrun of the vmcb occurred on a different physical
-	 * cpu, then mark the vmcb dirty and assign a new asid.  Hardware's
-	 * vmcb clean bits are per logical CPU, as are KVM's asid assignments.
+	 * If the previous VMRUN of the VMCB occurred on a different physical
+	 * CPU, then mark the VMCB dirty and flush the ASID.  Hardware's
+	 * VMCB clean bits are per logical CPU, as are KVM's ASID assignments.
 	 */
 	if (unlikely(svm->current_vmcb->cpu != vcpu->cpu)) {
-		svm->current_vmcb->asid_generation = 0;
+		vmcb_set_flush_asid(svm->vmcb);
 		vmcb_mark_all_dirty(svm->vmcb);
 		svm->current_vmcb->cpu = vcpu->cpu;
         }
@@ -3762,14 +3760,8 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
 	if (is_sev_guest(vcpu))
 		return pre_sev_run(svm, vcpu->cpu);
 
-	/* FIXME: handle wraparound of asid_generation */
-	if (svm->current_vmcb->asid_generation != sd->asid_generation)
-		new_asid(svm, sd);
-
-	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
-		svm->vmcb->control.asid = svm->asid;
-		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
-	}
+	if (unlikely(svm->vmcb->control.asid == fallback_asid))
+		vmcb_set_flush_asid(svm->vmcb);
 
 	return 0;
 }
@@ -5598,6 +5590,7 @@ static __init void svm_set_cpu_caps(void)
 
 static __init int svm_hardware_setup(void)
 {
+	unsigned long min_asid, nr_asids;
 	void *iopm_va;
 	int cpu, r;
 
@@ -5751,6 +5744,17 @@ static __init int svm_hardware_setup(void)
 
 	kvm_caps.inapplicable_quirks &= ~KVM_X86_QUIRK_CD_NW_CLEARED;
 
+	/* Consumes max_sev_asid initialized by sev_hardware_setup() */
+	min_asid = max_sev_asid + 1;
+	nr_asids = cpuid_ebx(SVM_CPUID_FUNC);
+	r = kvm_init_tlb_tags(min_asid, nr_asids - min_asid);
+	if (r)
+		goto err;
+
+	fallback_asid = kvm_alloc_tlb_tag();
+	if (!fallback_asid)
+		goto err;
+
 	for_each_possible_cpu(cpu) {
 		r = svm_cpu_init(cpu);
 		if (r)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4bf8afdc77cbd..4442e9fd4f5d0 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -26,6 +26,7 @@
 #include "regs.h"
 #include "x86.h"
 #include "pmu.h"
+#include "mmu.h"
 
 /*
  * Helpers to convert to/from physical addresses for pages whose address is
@@ -143,7 +144,6 @@ struct kvm_vmcb_info {
 	struct vmcb *ptr;
 	unsigned long pa;
 	int cpu;
-	uint64_t asid_generation;
 };
 
 struct vmcb_save_area_cached {
@@ -282,7 +282,7 @@ struct vcpu_svm {
 	struct vmcb *vmcb;
 	struct kvm_vmcb_info vmcb01;
 	struct kvm_vmcb_info *current_vmcb;
-	u32 asid;
+	kvm_tlb_tag_t asid;
 	u32 sysenter_esp_hi;
 	u32 sysenter_eip_hi;
 	uint64_t tsc_aux;
@@ -369,11 +369,6 @@ struct vcpu_svm {
 };
 
 struct svm_cpu_data {
-	u64 asid_generation;
-	u32 max_asid;
-	u32 next_asid;
-	u32 min_asid;
-
 	bool bp_spec_reduce_set;
 
 	struct vmcb *save_area;
@@ -476,6 +471,19 @@ static inline void vmcb_set_gpat(struct vmcb *vmcb, u64 data)
 	vmcb_mark_dirty(vmcb, VMCB_NPT);
 }
 
+extern kvm_tlb_tag_t fallback_asid;
+
+static inline kvm_tlb_tag_t allocate_asid(void)
+{
+	return kvm_alloc_tlb_tag() ?: fallback_asid;
+}
+
+static inline void free_asid(kvm_tlb_tag_t asid)
+{
+	if (likely(asid != fallback_asid))
+		kvm_free_tlb_tag(asid);
+}
+
 static inline void vmcb_set_flush_asid(struct vmcb *vmcb)
 {
 	if (static_cpu_has(X86_FEATURE_FLUSHBYASID))
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 10/25] KVM: nSVM: Add a placeholder ASID for L2
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (8 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 09/25] KVM: SVM: Use a static ASID per vCPU Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 11/25] KVM: x86: hyper-v: Rename kvm_hv_vcpu_purge_flush_tlb() Yosry Ahmed
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

In preparation for introducing a separate ASID for L2, introduce a
'placeholder' ASID that is still the same as L1's ASID. This will
facilitate future changes that need to distinguish L1 and L2's ASIDs,
before actually using a different ASID for L1 and L2.

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 5 +++--
 arch/x86/kvm/svm/svm.h    | 2 ++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 66c5b5131cbb1..bc8c466a1f821 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -698,7 +698,6 @@ static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
 	 *  - Honor L1's request to flush an ASID on nested VMRUN
 	 *  - Sync nested NPT MMU on VMRUN that flushes L2's ASID[*]
 	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
-	 *  - Flush L1's ASID on KVM_REQ_TLB_FLUSH_GUEST
 	 *
 	 * [*] Unlike nested EPT, SVM's ASID management can invalidate nested
 	 *     NPT guest-physical mappings on VMRUN.
@@ -925,7 +924,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	else
 		vmcb02->control.bus_lock_counter = 0;
 
-	vmcb02->control.asid = vmcb01->control.asid;
+	vmcb02->control.asid = svm->nested.asid02;
 
 	/* Overwritten later if necessary.  */
 	vmcb_clr_flush_asid(vmcb02);
@@ -1495,6 +1494,8 @@ int svm_allocate_nested(struct vcpu_svm *svm)
 	if (!svm->nested.msrpm)
 		goto err_free_vmcb02;
 
+	svm->nested.asid02 = svm->asid;
+
 	svm->nested.initialized = true;
 	return 0;
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4442e9fd4f5d0..6e98e02f7b8d5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -232,6 +232,8 @@ struct svm_nested_state {
 	 */
 	struct vmcb_save_area_cached save;
 
+	kvm_tlb_tag_t asid02;
+
 	bool initialized;
 
 	/*
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 11/25] KVM: x86: hyper-v: Rename kvm_hv_vcpu_purge_flush_tlb()
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (9 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 10/25] KVM: nSVM: Add a placeholder ASID for L2 Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 12/25] KVM: x86: hyper-v: Allow puring all TLB flush FIFOs Yosry Ahmed
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

Rename kvm_hv_vcpu_purge_flush_tlb() to kvm_hv_purge_tlb_flush_fifo() to
clarify that it purges the TLB flush FIFO, not purge *and* flush the
TLB.

No functional change intended.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/hyperv.h  | 4 ++--
 arch/x86/kvm/svm/svm.c | 2 +-
 arch/x86/kvm/x86.c     | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 65e89ed653497..a5c52e4b24b2a 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -204,7 +204,7 @@ static inline struct kvm_vcpu_hv_tlb_flush_fifo *kvm_hv_get_tlb_flush_fifo(struc
 	return &hv_vcpu->tlb_flush_fifo[i];
 }
 
-static inline void kvm_hv_vcpu_purge_flush_tlb(struct kvm_vcpu *vcpu)
+static inline void kvm_hv_purge_tlb_flush_fifo(struct kvm_vcpu *vcpu)
 {
 	struct kvm_vcpu_hv_tlb_flush_fifo *tlb_flush_fifo;
 
@@ -286,7 +286,7 @@ static inline int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 {
 	return HV_STATUS_ACCESS_DENIED;
 }
-static inline void kvm_hv_vcpu_purge_flush_tlb(struct kvm_vcpu *vcpu) {}
+static inline void kvm_hv_purge_tlb_flush_fifo(struct kvm_vcpu *vcpu) {}
 static inline bool kvm_hv_synic_has_vector(struct kvm_vcpu *vcpu, int vector)
 {
 	return false;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index fae5cb7102010..dac3d0098bb77 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4173,7 +4173,7 @@ static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
 	 * A TLB flush for the current ASID flushes both "host" and "guest" TLB
 	 * entries, and thus is a superset of Hyper-V's fine grained flushing.
 	 */
-	kvm_hv_vcpu_purge_flush_tlb(vcpu);
+	kvm_hv_purge_tlb_flush_fifo(vcpu);
 
 	/*
 	 * Flush only the current ASID even if the TLB flush was invoked via
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf122b8c32103..e67cdca8cc48f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3690,7 +3690,7 @@ static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
 	 * Flushing all "guest" TLB is always a superset of Hyper-V's fine
 	 * grained flushing.
 	 */
-	kvm_hv_vcpu_purge_flush_tlb(vcpu);
+	kvm_hv_purge_tlb_flush_fifo(vcpu);
 }
 
 
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 12/25] KVM: x86: hyper-v: Allow puring all TLB flush FIFOs
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (10 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 11/25] KVM: x86: hyper-v: Rename kvm_hv_vcpu_purge_flush_tlb() Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 13/25] KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH Yosry Ahmed
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

Refactor kvm_hv_purge_tlb_flush_fifo() to introduce an inner helper,
__kvm_hv_purge_tlb_flush_fifo(), parameterized by an optional FIFO. If a
FIFO is not passed in, all FIFOs are purged.

Note that KVM_REQ_HV_TLB_FLUSH is consumed by
__kvm_hv_purge_tlb_flush_fifo(), so a selective purge on one FIFO will
render any subsequent non-selective purge useless. This is not a problem
because purging TLB flush FIFOs is an optimization (and there are
currently no users of the non-selective purges).

No functional change intended as a FIFO is always being passed in now.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/hyperv.h | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index a5c52e4b24b2a..0f64038408072 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -204,16 +204,31 @@ static inline struct kvm_vcpu_hv_tlb_flush_fifo *kvm_hv_get_tlb_flush_fifo(struc
 	return &hv_vcpu->tlb_flush_fifo[i];
 }
 
-static inline void kvm_hv_purge_tlb_flush_fifo(struct kvm_vcpu *vcpu)
+/* Purge pending TLB flushes in @fifo, or in all FIFOs if @fifo is NULL */
+static inline void __kvm_hv_purge_tlb_flush_fifo(struct kvm_vcpu *vcpu,
+						 struct kvm_vcpu_hv_tlb_flush_fifo *fifo)
 {
-	struct kvm_vcpu_hv_tlb_flush_fifo *tlb_flush_fifo;
+	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+	int i;
 
-	if (!to_hv_vcpu(vcpu) || !kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu))
+	if (!hv_vcpu || !kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu))
 		return;
 
-	tlb_flush_fifo = kvm_hv_get_tlb_flush_fifo(vcpu, is_guest_mode(vcpu));
+	for (i = 0; i < ARRAY_SIZE(hv_vcpu->tlb_flush_fifo); i++) {
+		if (!fifo || fifo == &hv_vcpu->tlb_flush_fifo[i])
+			kfifo_reset_out(&hv_vcpu->tlb_flush_fifo[i].entries);
+	}
+}
+
+static inline void kvm_hv_purge_tlb_flush_fifo(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_hv_tlb_flush_fifo *fifo;
+
+	if (!to_hv_vcpu(vcpu))
+		return;
 
-	kfifo_reset_out(&tlb_flush_fifo->entries);
+	fifo = kvm_hv_get_tlb_flush_fifo(vcpu, is_guest_mode(vcpu));
+	__kvm_hv_purge_tlb_flush_fifo(vcpu, fifo);
 }
 
 static inline bool guest_hv_cpuid_has_l2_tlb_flush(struct kvm_vcpu *vcpu)
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 13/25] KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (11 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 12/25] KVM: x86: hyper-v: Allow puring all TLB flush FIFOs Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  1:05   ` sashiko-bot
  2026-06-16  0:41 ` [RFC PATCH v2 14/25] KVM: nSVM: Move svm_switch_vmcb() to nested.c Yosry Ahmed
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

Flush both L1 and L2 ASIDs in svm_flush_tlb_all() to appropriately
handle KVM_REQ_TLB_FLUSH by flushing all TLB entries in all contexts
(e.g. for kvm_flush_remote_tlbs()). Since both L1 and L2 currently share
an ASID, this is effectively a noop, but it won't be once L2 has a
separate ASID.

Purge all Hyper-V TLB FIFOs (for both L1 and L2), since both ASIDs are
flushed, and an ASID flush is a superset of Hyper-V's fine-grained
flushing (see comment in svm_flush_tlb_asid()).

Note that if one TLB flush FIFO is purged (e.g. as a result of
KVM_REQ_TLB_FLUSH_CURRENT), it will consume KVM_REQ_HV_TLB_FLUSH, and a
subsequent KVM_REQ_TLB_FLUSH will not flush the other FIFO. This is
alright as flushing both FIFOs is a (newly introduced) optimization
anyway. The other FIFO will be checked after a nested transition, as
KVM_REQ_HV_TLB_FLUSH is always set on nested transitions.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/hyperv.h     | 6 ++++++
 arch/x86/kvm/svm/nested.c | 1 -
 arch/x86/kvm/svm/svm.c    | 8 +++++++-
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 0f64038408072..6f212d5fc8d50 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -231,6 +231,11 @@ static inline void kvm_hv_purge_tlb_flush_fifo(struct kvm_vcpu *vcpu)
 	__kvm_hv_purge_tlb_flush_fifo(vcpu, fifo);
 }
 
+static inline void kvm_hv_purge_all_tlb_flush_fifos(struct kvm_vcpu *vcpu)
+{
+	__kvm_hv_purge_tlb_flush_fifo(vcpu, NULL);
+}
+
 static inline bool guest_hv_cpuid_has_l2_tlb_flush(struct kvm_vcpu *vcpu)
 {
 	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
@@ -302,6 +307,7 @@ static inline int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 	return HV_STATUS_ACCESS_DENIED;
 }
 static inline void kvm_hv_purge_tlb_flush_fifo(struct kvm_vcpu *vcpu) {}
+static inline void kvm_hv_purge_all_tlb_flush_fifos(struct kvm_vcpu *vcpu) {}
 static inline bool kvm_hv_synic_has_vector(struct kvm_vcpu *vcpu, int vector)
 {
 	return false;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index bc8c466a1f821..eb56c2fbf2832 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -694,7 +694,6 @@ static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
 	 * TODO: optimize unconditional TLB flush/MMU sync.  A partial list of
 	 * things to fix before this can be conditional:
 	 *
-	 *  - Flush TLBs for both L1 and L2 remote TLB flush
 	 *  - Honor L1's request to flush an ASID on nested VMRUN
 	 *  - Sync nested NPT MMU on VMRUN that flushes L2's ASID[*]
 	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dac3d0098bb77..4f13c3456a5d7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4203,6 +4203,8 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
 
 static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+
 	/*
 	 * When running on Hyper-V with EnlightenedNptTlb enabled, remote TLB
 	 * flushes should be routed to hv_flush_remote_tlbs() without requesting
@@ -4213,7 +4215,11 @@ static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
 	if (WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
 		hv_flush_remote_tlbs(vcpu->kvm);
 
-	svm_flush_tlb_asid(vcpu);
+	kvm_hv_purge_all_tlb_flush_fifos(vcpu);
+
+	vmcb_set_flush_asid(svm->vmcb01.ptr);
+	if (svm->nested.vmcb02.ptr)
+		vmcb_set_flush_asid(svm->nested.vmcb02.ptr);
 }
 
 static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 14/25] KVM: nSVM: Move svm_switch_vmcb() to nested.c
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (12 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 13/25] KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 15/25] KVM: nSVM: Call nested_svm_transition_tlb_flush() on every VMCB switch Yosry Ahmed
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

svm_switch_vmcb() is only used outside of nested.c to initialize
svm->current_vmcb and svm->vmcb. Open-code the initialization (which is
arguably not a "switch" to begin with) in the vCPU creation path, and
move svm_switch_vmcb() to nested.c in preparation for adding more nested
logic.

No fcuntional change intended.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c |  6 ++++++
 arch/x86/kvm/svm/svm.c    | 10 +++-------
 arch/x86/kvm/svm/svm.h    |  2 --
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index eb56c2fbf2832..24a52aefe94ee 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -705,6 +705,12 @@ static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
 	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
 
+static void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb)
+{
+	svm->current_vmcb = target_vmcb;
+	svm->vmcb = target_vmcb->ptr;
+}
+
 /*
  * Load guest's/host's cr3 on nested vmentry or vmexit. @nested_npt is true
  * if we are emulating VM-Entry into a guest with NPT enabled.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4f13c3456a5d7..14733ff8bcd19 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1300,12 +1300,6 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 		__svm_vcpu_reset(vcpu);
 }
 
-void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb)
-{
-	svm->current_vmcb = target_vmcb;
-	svm->vmcb = target_vmcb->ptr;
-}
-
 static int svm_vcpu_precreate(struct kvm *kvm)
 {
 	return avic_alloc_physical_id_table(kvm);
@@ -1353,7 +1347,9 @@ static int svm_vcpu_create(struct kvm_vcpu *vcpu)
 
 	svm->vmcb01.ptr = page_address(vmcb01_page);
 	svm->vmcb01.pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT);
-	svm_switch_vmcb(svm, &svm->vmcb01);
+
+	svm->current_vmcb = &svm->vmcb01;
+	svm->vmcb = svm->vmcb01.ptr;
 
 	svm->guest_state_loaded = false;
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6e98e02f7b8d5..4dcfd56882da5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -922,8 +922,6 @@ void nested_copy_vmcb_control_to_cache(struct vcpu_svm *svm,
 void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
 				    struct vmcb_save_area *save);
 void nested_sync_control_from_vmcb02(struct vcpu_svm *svm);
-void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);
-
 
 static inline void __svm_pmu_handle_nested_transition(struct vcpu_svm *svm,
 						      bool defer)
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 15/25] KVM: nSVM: Call nested_svm_transition_tlb_flush() on every VMCB switch
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (13 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 14/25] KVM: nSVM: Move svm_switch_vmcb() to nested.c Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 16/25] KVM: nSVM: Split nested_svm_transition_tlb_flush() into entry/exit fns Yosry Ahmed
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

Move the calls to nested_svm_transition_tlb_flush() in different
transition code paths to svm_switch_vmcb(). This ensures that TLB
flushes are not missed during switches, and makes it clearer that the
TLB flushes are directly related to the active VMCB.

The ordering currently doesn't matter, but as more TLB handling is added
for nested, requesting the TLB flushes *after* the VMCB switch will
become a requirement.

No functional change intended (for now).

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 24a52aefe94ee..9a917d057aa8e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -707,8 +707,12 @@ static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
 
 static void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb)
 {
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+
 	svm->current_vmcb = target_vmcb;
 	svm->vmcb = target_vmcb->ptr;
+
+	nested_svm_transition_tlb_flush(vcpu);
 }
 
 /*
@@ -860,8 +864,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 	struct vmcb *vmcb01 = svm->vmcb01.ptr;
 	struct kvm_vcpu *vcpu = &svm->vcpu;
 
-	nested_svm_transition_tlb_flush(vcpu);
-
 	/* Enter Guest-Mode */
 	enter_guest_mode(vcpu);
 	svm_pmu_handle_nested_transition(svm);
@@ -1435,8 +1437,6 @@ void nested_svm_vmexit(struct vcpu_svm *svm)
 	svm->vcpu.arch.dr7 = DR7_FIXED_1;
 	kvm_update_dr7(&svm->vcpu);
 
-	nested_svm_transition_tlb_flush(vcpu);
-
 	nested_svm_uninit_mmu_context(vcpu);
 
 	if (nested_svm_load_cr3(vcpu, vmcb01->save.cr3, false, true))
@@ -1556,8 +1556,6 @@ void svm_leave_nested(struct kvm_vcpu *vcpu)
 
 		svm_switch_vmcb(svm, &svm->vmcb01);
 
-		nested_svm_transition_tlb_flush(vcpu);
-
 		nested_svm_uninit_mmu_context(vcpu);
 		vmcb_mark_all_dirty(svm->vmcb);
 
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 16/25] KVM: nSVM: Split nested_svm_transition_tlb_flush() into entry/exit fns
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (14 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 15/25] KVM: nSVM: Call nested_svm_transition_tlb_flush() on every VMCB switch Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 17/25] KVM: nSVM: Service local TLB flushes before nested transitions Yosry Ahmed
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

The handling for the entry and exit TLB flushes will diverge
significantly in the following changes. Instead of adding an 'is_vmenter'
argument like nested_vmx_transition_tlb_flush(), just split the function
into two variants for 'entry' and 'exit'.

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 9a917d057aa8e..234724d8b4c54 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -685,7 +685,7 @@ static void nested_save_pending_event_to_vmcb12(struct vcpu_svm *svm,
 	vmcb12->control.exit_int_info = exit_int_info;
 }
 
-static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
+static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 {
 	/* Handle pending Hyper-V TLB flush requests */
 	kvm_hv_nested_transtion_tlb_flush(vcpu, npt_enabled);
@@ -705,6 +705,14 @@ static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
 	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
 
+static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
+{
+	kvm_hv_nested_transtion_tlb_flush(vcpu, npt_enabled);
+
+	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
+	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+}
+
 static void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb)
 {
 	struct kvm_vcpu *vcpu = &svm->vcpu;
@@ -712,7 +720,10 @@ static void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_v
 	svm->current_vmcb = target_vmcb;
 	svm->vmcb = target_vmcb->ptr;
 
-	nested_svm_transition_tlb_flush(vcpu);
+	if (target_vmcb == &svm->nested.vmcb02)
+		nested_svm_entry_tlb_flush(vcpu);
+	else
+		nested_svm_exit_tlb_flush(vcpu);
 }
 
 /*
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 17/25] KVM: nSVM: Service local TLB flushes before nested transitions
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (15 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 16/25] KVM: nSVM: Split nested_svm_transition_tlb_flush() into entry/exit fns Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  1:20   ` sashiko-bot
  2026-06-16  0:41 ` [RFC PATCH v2 18/25] KVM: nSVM: Handle nested TLB flush requests through TLB_CONTROL Yosry Ahmed
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

KVM does not track TLB flush requests for L1 vs. L2. Hence, service
local flush that target the current context before switching to a new
one. Since TLB flushes are performed through the VMCB's TLB_CONTROL
field, service the flushes before every VMCB switch.

Note that nested_svm_{entry/exit}_tlb_flush() must be called after
kvm_service_local_tlb_flush_requests(), otherwise the TLB flushes will
be immediately serviced in the "old" VMCB rather than the new one.

This is conceptually similar to how nVMX calls
kvm_service_local_tlb_flush_requests() with a few differences:

1. VMX performs TLB flushes through INVVPID or INVEPT. The VPID is
   determined based on guest mode, and the EPT pointer is determined
   based on the active MMU. Hence, local TLB flushes are serviced before
   switching guest mode and switching the MMU. On the other hand, SVM
   performs TLB flushes by updating the VMCB, hence local TLB flushes
   are serviced before switching the VMCB.

2. VMX has a single code path for entering guest mode (i.e.
   nested_vmx_enter_non_root_mode()) and a single code path for exiting
   guest mode (i.e. nested_vmx_vmexit()). Other code paths like
   vmx_set_nested_state() and vmx_leave_nested() call into these
   functions. On the other hand, SVM open codes the switches in several
   places, so call kvm_service_local_tlb_flush_requests() from
   svm_switch_svm() to more-or-less guarantee it is not missed.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 234724d8b4c54..7b19191e0e43f 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -717,9 +717,16 @@ static void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_v
 {
 	struct kvm_vcpu *vcpu = &svm->vcpu;
 
+	/*
+	 * TLB flushes are applied to the VMCB, so apply any pending TLB flushes
+	 * on the current VMCB before switching to a new one..
+	 */
+	kvm_service_local_tlb_flush_requests(vcpu);
+
 	svm->current_vmcb = target_vmcb;
 	svm->vmcb = target_vmcb->ptr;
 
+	/* .. then request TLB flushes needed for the new VMCB */
 	if (target_vmcb == &svm->nested.vmcb02)
 		nested_svm_entry_tlb_flush(vcpu);
 	else
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 18/25] KVM: nSVM: Handle nested TLB flush requests through TLB_CONTROL
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (16 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 17/25] KVM: nSVM: Service local TLB flushes before nested transitions Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 19/25] KVM: nSVM: Flush the TLB if L1 changes L2's ASID in vmcb12 Yosry Ahmed
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

Handle L1's requests to flush L2's TLB through the TLB_CONTROL field of
VMCB12.

On nested VM-Enter, flush L2's ASID if any flush is specified in
TLB_CONTROL. This handles TLB_CONTROL_FLUSH_ASID and
TLB_CONTROL_FLUSH_ASID_LOCAL equally for simplicity.

On nested VM-Exit, flush L1's own ASID if L1 requested a *full* TLB
flush (i.e. TLB_CONTROL_FLUSH_ALL_ASID).

Essentially, TLB_CONTROL_FLUSH_ASID[_LOCAL] cause a TLB flush on nested
VM-Enter only, while TLB_CONTROL_FLUSH_ALL_ASID causes a TLB flush on
both nested VM-Enter and nested VM-Exit.

Additionally, sync the nested NPTs on nested VM-Enter on an ASID flush,
as ASID flushes also invalidate guest-physical translations on SVM
(unlike VMX, which has separate VPID-based and EPT-based flushing).

All TLB_CONTROL values can be handled by KVM regardless of FLUSHBYASID
support on the underlying CPU, so keep advertising FLUSHBYASID to the
guest unconditionally.

Note, TLB_CONTROL_FLUSH_ALL_ASID is never propagated from the vmcb12 to
the vmcb02 (unless FLUSHBYASID is not available), as this gives the
guest the power to flush the entire physical TLB (including translations
for the host and other VMs).

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 30 +++++++++++++++++++++++++-----
 arch/x86/kvm/svm/svm.c    |  5 ++---
 2 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 7b19191e0e43f..2c04b12121fb2 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -687,19 +687,33 @@ static void nested_save_pending_event_to_vmcb12(struct vcpu_svm *svm,
 
 static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+
 	/* Handle pending Hyper-V TLB flush requests */
 	kvm_hv_nested_transtion_tlb_flush(vcpu, npt_enabled);
 
+	/*
+	 * If L1 requested a TLB flush for L2, flush L2's TLB on nested entry
+	 * and sync the nested NPT MMU, as TLB_CONTROL also flushes NPT
+	 * guest-physical mappings.
+	 *
+	 * If L1 requested a full TLB flush for all ASIDs (including its own),
+	 * L1's own ASID is also flushed on nested VM-Exit, before running L1.
+	 *
+	 * TLB_CONTROL_FLUSH_ASID and TLB_CONTROL_FLUSH_ASID_LOCAL are handled
+	 * equally for simplicity.
+	 */
+	if (svm->nested.ctl.tlb_ctl != TLB_CONTROL_DO_NOTHING) {
+		if (nested_npt_enabled(svm))
+			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
+		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+	}
+
 	/*
 	 * TODO: optimize unconditional TLB flush/MMU sync.  A partial list of
 	 * things to fix before this can be conditional:
 	 *
-	 *  - Honor L1's request to flush an ASID on nested VMRUN
-	 *  - Sync nested NPT MMU on VMRUN that flushes L2's ASID[*]
 	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
-	 *
-	 * [*] Unlike nested EPT, SVM's ASID management can invalidate nested
-	 *     NPT guest-physical mappings on VMRUN.
 	 */
 	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
 	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
@@ -707,8 +721,14 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 
 static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+
 	kvm_hv_nested_transtion_tlb_flush(vcpu, npt_enabled);
 
+	/* Flush L1's own ASID if it request a *full* TLB flush on VMRUN */
+	if (svm->nested.ctl.tlb_ctl == TLB_CONTROL_FLUSH_ALL_ASID)
+		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+
 	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
 	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 14733ff8bcd19..87f82ae51e8b1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5512,9 +5512,8 @@ static __init void svm_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_VMCBCLEAN);
 
 		/*
-		 * KVM currently flushes TLBs on *every* nested SVM transition,
-		 * and so for all intents and purposes KVM supports flushing by
-		 * ASID, i.e. KVM is guaranteed to honor every L1 ASID flush.
+		 * KVM handles all TLB_CONTROL values set by L1, even if the
+		 * underlying CPU does not.
 		 */
 		kvm_cpu_cap_set(X86_FEATURE_FLUSHBYASID);
 
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 19/25] KVM: nSVM: Flush the TLB if L1 changes L2's ASID in vmcb12
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (17 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 18/25] KVM: nSVM: Handle nested TLB flush requests through TLB_CONTROL Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 20/25] KVM: nSVM: Do not reset TLB_CONTROL in vmcb02 on nested VM-Enter Yosry Ahmed
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

KVM uses a single ASID for L2 guests per-vCPU. Hence, when L1 changes
L2's ASID in vmcb12 (e.g. due to switching a different L2
vCPU or simply to avoid flushing an existing ASID), KVM needs to flush
the L2 ASID to correctly emulate different ASIDs as different TLB
domains.

Additionally, the nested NPT needs to be resync'd, as the MMU context
is not tagged by ASID, and KVM cannot reuse nested NPT entries across
different L2 ASIDs.

Essentially, L1 switching L2's ASID is treated exactly the same as L1
flushing L2's ASID, which is consistent with the APM:

  Software may effectively flush the guest's TLB entries by allocating a
  new ASID for the guest and not reusing the old ASID until the entire
  TLB has been flushed at least once.

This is similar to nVMX's handling of last_vpid, except that when L1
changes VPID12, KVM does *not* need to resync the nested EPT, because
VMX VPIDs , unlike SVM ASIDs, do not tag guest-physical translations
(i.e. nGPA to GPA translations).

Drop the commentary about vmcb12's ASID being copied around only for the
consistency checks, as they no longer apply, and opportunistically fix
whitespace alignment.

This is currently functionally a noop, as a full flush and sync is
triggered on every nested transition, but is a step toward eliminating
that.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 22 ++++++++++++++++------
 arch/x86/kvm/svm/svm.h    |  2 ++
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 2c04b12121fb2..f91c22e72151e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -553,10 +553,8 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
 	to->misc_ctl2		= from->misc_ctl2;
 	to->pause_filter_count  = from->pause_filter_count;
 	to->pause_filter_thresh = from->pause_filter_thresh;
-
-	/* Copy asid here because nested_vmcb_check_controls() will check it */
-	to->asid           = from->asid;
-	to->clean = from->clean;
+	to->asid		= from->asid;
+	to->clean		= from->clean;
 
 #ifdef CONFIG_KVM_HYPERV
 	/* Hyper-V extensions (Enlightened VMCB) */
@@ -688,22 +686,34 @@ static void nested_save_pending_event_to_vmcb12(struct vcpu_svm *svm,
 static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	bool new_asid = false;
 
 	/* Handle pending Hyper-V TLB flush requests */
 	kvm_hv_nested_transtion_tlb_flush(vcpu, npt_enabled);
 
+	if (svm->nested.ctl.asid != svm->nested.last_asid) {
+		svm->nested.last_asid = svm->nested.ctl.asid;
+		new_asid = true;
+	}
+
 	/*
 	 * If L1 requested a TLB flush for L2, flush L2's TLB on nested entry
 	 * and sync the nested NPT MMU, as TLB_CONTROL also flushes NPT
 	 * guest-physical mappings.
 	 *
+	 * Handle L1 changing L2's ASID12 similarly, as KVM only uses one ASID
+	 * for L2 in hardware (per vCPU), so it must start fresh when L1 changes
+	 * ASID12 to emulate different ASIDs correctly.  Additionally, the MMU
+	 * context is not tagged by the ASID, so the shadow NPTs cannot be
+	 * reused across different L2 ASIDs.
+	 *
 	 * If L1 requested a full TLB flush for all ASIDs (including its own),
 	 * L1's own ASID is also flushed on nested VM-Exit, before running L1.
 	 *
 	 * TLB_CONTROL_FLUSH_ASID and TLB_CONTROL_FLUSH_ASID_LOCAL are handled
 	 * equally for simplicity.
 	 */
-	if (svm->nested.ctl.tlb_ctl != TLB_CONTROL_DO_NOTHING) {
+	if (new_asid || (svm->nested.ctl.tlb_ctl != TLB_CONTROL_DO_NOTHING)) {
 		if (nested_npt_enabled(svm))
 			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
 		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
@@ -1882,7 +1892,7 @@ void nested_svm_update_tsc_ratio_msr(struct kvm_vcpu *vcpu)
 	svm_write_tsc_multiplier(vcpu);
 }
 
-/* Inverse operation of nested_copy_vmcb_control_to_cache(). asid is copied too. */
+/* Inverse operation of nested_copy_vmcb_control_to_cache() */
 static void nested_copy_vmcb_cache_to_control(struct vmcb_control_area *dst,
 					      struct vmcb_ctrl_area_cached *from)
 {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4dcfd56882da5..bfbe774829b97 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -244,6 +244,8 @@ struct svm_nested_state {
 	 * on its side.
 	 */
 	bool force_msr_bitmap_recalc;
+
+	u32 last_asid;
 };
 
 struct vcpu_sev_es_state {
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 20/25] KVM: nSVM: Do not reset TLB_CONTROL in vmcb02 on nested VM-Enter
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (18 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 19/25] KVM: nSVM: Flush the TLB if L1 changes L2's ASID in vmcb12 Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 21/25] KVM: x86/mmu: rename __kvm_mmu_invalidate_addr() Yosry Ahmed
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

Stop clearing TLB_CONTROL when preparing the control area of vmcb02, as
this potentially undos pending TLB flushes for L2 (e.g. through
KVM_REQ_TLB_FLUSH while L1 is running), and remove the associated TODO
comment.

This is currently harmless, because nested_svm_entry_tlb_flush() always
requests KVM_REQ_TLB_FLUSH_CURRENT on nested VM-Enter, which sets
TLB_CONTROL again before L2 is actually run. However, always flushing
will soon go away with proper TLB handling for L2, at which point always
clearing TLB_CONTROL would be a bug.

Clearing TLB_CONTROL on nested VM-Enter was probably done because
TLB_CONTROL is not cleared by the CPU on VM-Exit. However, KVM always
clears TLB_CONTROL in the active VMCB after VMRUN. Hence, at nested
VM-Enter, TLB_CONTROL in vmcb02 can only be non-zero if a TLB flush is
queued for L2 while L1 is running (i.e. KVM_REQ_TLB_FLUSH), and that
should never be ignored.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index f91c22e72151e..a226aca8f9108 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -719,12 +719,7 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
 	}
 
-	/*
-	 * TODO: optimize unconditional TLB flush/MMU sync.  A partial list of
-	 * things to fix before this can be conditional:
-	 *
-	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
-	 */
+	/* TODO: optimize unconditional TLB flush/MMU sync */
 	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
 	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
@@ -981,9 +976,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
 
 	vmcb02->control.asid = svm->nested.asid02;
 
-	/* Overwritten later if necessary.  */
-	vmcb_clr_flush_asid(vmcb02);
-
 	/* Use vmcb01 MMU and format if guest does not use nNPT */
 	if (nested_npt_enabled(svm)) {
 		vmcb02->control.misc_ctl &= ~SVM_MISC_ENABLE_GMET;
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 21/25] KVM: x86/mmu: rename __kvm_mmu_invalidate_addr()
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (19 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 20/25] KVM: nSVM: Do not reset TLB_CONTROL in vmcb02 on nested VM-Enter Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 22/25] KVM: x86/mmu: Refactor kvm_mmu_invlpg() to allow skipping the gva flush Yosry Ahmed
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

In preparation for creating another helper for
kvm_mmu_invalidate_addr(), rename __kvm_mmu_invalidate_addr() to
kvm_mmu_invalidate_addr_in_root().

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/mmu/mmu.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index bf2e0c2205631..65c35ed8f4a01 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6577,8 +6577,9 @@ void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg)
 }
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_print_sptes);
 
-static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
-				      u64 addr, hpa_t root_hpa)
+static void kvm_mmu_invalidate_addr_in_root(struct kvm_vcpu *vcpu,
+					    struct kvm_mmu *mmu,
+					    u64 addr, hpa_t root_hpa)
 {
 	struct kvm_shadow_walk_iterator iterator;
 
@@ -6634,11 +6635,11 @@ void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 		return;
 
 	if (roots & KVM_MMU_ROOT_CURRENT)
-		__kvm_mmu_invalidate_addr(vcpu, mmu, addr, mmu->root.hpa);
+		kvm_mmu_invalidate_addr_in_root(vcpu, mmu, addr, mmu->root.hpa);
 
 	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
 		if (roots & KVM_MMU_ROOT_PREVIOUS(i))
-			__kvm_mmu_invalidate_addr(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
+			kvm_mmu_invalidate_addr_in_root(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
 	}
 }
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_invalidate_addr);
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 22/25] KVM: x86/mmu: Refactor kvm_mmu_invlpg() to allow skipping the gva flush
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (20 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 21/25] KVM: x86/mmu: rename __kvm_mmu_invalidate_addr() Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 23/25] KVM: nSVM: Flush L2's ASID when emulating INVLPGA Yosry Ahmed
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

Refactor helpers out of kvm_mmu_invalidate_addr() and kvm_mmu_invlpg()
that take in an extra argument to skip the GVA flush.

This will be used when invalidating GVAs in a different context than the
correct one (i.e.  invalidating an L2 GVA from L1), so flushing the
current context would flush the wrong TLB entries.

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/mmu/mmu.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 65c35ed8f4a01..3feb75732f7b4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6615,15 +6615,15 @@ static void kvm_mmu_invalidate_addr_in_root(struct kvm_vcpu *vcpu,
 	write_unlock(&vcpu->kvm->mmu_lock);
 }
 
-void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
-			     u64 addr, unsigned long roots)
+static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+				      u64 addr, unsigned long roots, bool flush_gva)
 {
 	int i;
 
 	WARN_ON_ONCE(roots & ~KVM_MMU_ROOTS_ALL);
 
 	/* It's actually a GPA for vcpu->arch.guest_mmu.  */
-	if (mmu != &vcpu->arch.guest_mmu) {
+	if (flush_gva && mmu != &vcpu->arch.guest_mmu) {
 		/* INVLPG on a non-canonical address is a NOP according to the SDM.  */
 		if (is_noncanonical_invlpg_address(addr, vcpu))
 			return;
@@ -6642,9 +6642,15 @@ void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			kvm_mmu_invalidate_addr_in_root(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
 	}
 }
+
+void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+			       u64 addr, unsigned long roots)
+{
+	__kvm_mmu_invalidate_addr(vcpu, mmu, addr, roots, true);
+}
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_invalidate_addr);
 
-void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
+static void __kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva, bool flush_gva)
 {
 	/*
 	 * INVLPG is required to invalidate any global mappings for the VA,
@@ -6656,11 +6662,16 @@ void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
 	 * be synced when switching to that new cr3, so nothing needs to be
 	 * done here for them.
 	 */
-	kvm_mmu_invalidate_addr(vcpu, vcpu->arch.walk_mmu, gva, KVM_MMU_ROOTS_ALL);
+	__kvm_mmu_invalidate_addr(vcpu, vcpu->arch.walk_mmu, gva,
+				  KVM_MMU_ROOTS_ALL, flush_gva);
 	++vcpu->stat.invlpg;
 }
-EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_invlpg);
 
+void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
+{
+	__kvm_mmu_invlpg(vcpu, gva, true);
+}
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_invlpg);
 
 void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
 {
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 23/25] KVM: nSVM: Flush L2's ASID when emulating INVLPGA
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (21 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 22/25] KVM: x86/mmu: Refactor kvm_mmu_invlpg() to allow skipping the gva flush Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 24/25] KVM: nSVM: Use different ASIDs for L1 and L2 Yosry Ahmed
  2026-06-16  0:41 ` [RFC PATCH v2 25/25] DO NOT MERGE: Add nested_tlb_force_flush Yosry Ahmed
  24 siblings, 0 replies; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

KVM currently handles INVLPGA in the same way as INVLPG, flushing L1's
own ASID. This is currently correct because L1 and L2 share an ASID, but
it doesn't work once they have separate ASIDs.

If L1 is flushing a different L2 ASID than the one KVM is tracking, do
nothing, as KVM will flush L2's ASID in hardware (and sync the MMU if
needed) when L1 switches to the target ASID.

If L1 is flushing its own ASID, handle the flush the same as INVLPG.
Otherwise, skip the GVA flush for the current context (L1's ASID), and
flush the L2 ASID in hardware using INVLPGA if running on the same CPU
as L2. If not, fallback to a VMCB-based ASID flush. Note that if L2 then
runs on a different CPU KVM will flush the ASID anyway.

Either way, sync the MMU if NPT is disabled, which is handled by
__kvm_mmu_invlpg(). Note that all MMU roots are sync'd when NPT is
disabled, which can be optimized by keying off guest_mode to only sync
the appropriate context (L1 vs. L2).

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/mmu/mmu.c          |  8 ++++---
 arch/x86/kvm/svm/svm.c          | 38 +++++++++++++++++++++++++++++++--
 3 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3886b536c8a57..ee83dd4a56712 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2385,7 +2385,10 @@ static inline void kvm_dec_apicv_irq_window_req(struct kvm *kvm)
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
 		       void *insn, int insn_len);
 void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg);
+void __kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva, bool flush_gva);
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
+void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+			       u64 addr, unsigned long roots, bool flush_gva);
 void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			     u64 addr, unsigned long roots);
 void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3feb75732f7b4..b55c2fa2b388b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6615,8 +6615,8 @@ static void kvm_mmu_invalidate_addr_in_root(struct kvm_vcpu *vcpu,
 	write_unlock(&vcpu->kvm->mmu_lock);
 }
 
-static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
-				      u64 addr, unsigned long roots, bool flush_gva)
+void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+			       u64 addr, unsigned long roots, bool flush_gva)
 {
 	int i;
 
@@ -6642,6 +6642,7 @@ static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu
 			kvm_mmu_invalidate_addr_in_root(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
 	}
 }
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_mmu_invalidate_addr);
 
 void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			       u64 addr, unsigned long roots)
@@ -6650,7 +6651,7 @@ void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 }
 EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_mmu_invalidate_addr);
 
-static void __kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva, bool flush_gva)
+void __kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva, bool flush_gva)
 {
 	/*
 	 * INVLPG is required to invalidate any global mappings for the VA,
@@ -6666,6 +6667,7 @@ static void __kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva, bool flush_gva)
 				  KVM_MMU_ROOTS_ALL, flush_gva);
 	++vcpu->stat.invlpg;
 }
+EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_mmu_invlpg);
 
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
 {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 87f82ae51e8b1..bc5a1cff04647 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2404,17 +2404,51 @@ static int clgi_interception(struct kvm_vcpu *vcpu)
 
 static int invlpga_interception(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
 	/* FIXME: Handle an address size prefix. */
 	gva_t gva = kvm_rax_read(vcpu);
 	u32 asid = kvm_ecx_read(vcpu);
+	int cpu;
 
 	if (nested_svm_check_permissions(vcpu))
 		return 1;
 
 	trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);
 
-	/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
-	kvm_mmu_invlpg(vcpu, gva);
+	/*
+	 * INVLPG on a non-canonical address is a NOP according to the SDM,
+	 * assumethe same behavior from INVLPGA since the APM doesn't specify.
+	 */
+	if (is_noncanonical_invlpg_address(gva, vcpu))
+		return kvm_skip_emulated_instruction(vcpu);
+
+	/*
+	 * Do nothing if L1 is flushing a different L2 ASID than the one KVM is
+	 * currently tracking.  KVM tracks a single L2 ASID, and performs a TLB
+	 * flush (and MMU resync if needed) when L1 switches ASIDs anyway.
+	 */
+	if (asid && asid != svm->nested.last_asid)
+		return kvm_skip_emulated_instruction(vcpu);
+
+	/*
+	 * Handle INVLPGA similar to INVLPG, with one caveat. If the specified
+	 * ASID is non-zero (i.e. L1 is not flushing it's own ASID), skip
+	 * flushing the TLB for the current context (L1's), and use INVLPGA to
+	 * flush L2's ASID in hardware if running on the same CPU (otherwise
+	 * fallback to a full ASID flush).
+	 *
+	 * Note, if NPT is disabled, this will sync all the shadow page tables.
+	 * This can be optimized by keying off guest_mode.
+	 */
+	__kvm_mmu_invlpg(vcpu, gva, !asid);
+	if (asid) {
+		cpu = get_cpu();
+		if (cpu == svm->nested.vmcb02.cpu)
+			invlpga(gva, svm->nested.asid02);
+		else
+			vmcb_set_flush_asid(svm->nested.vmcb02.ptr);
+		put_cpu();
+	}
 
 	return kvm_skip_emulated_instruction(vcpu);
 }
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 24/25] KVM: nSVM: Use different ASIDs for L1 and L2
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (22 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 23/25] KVM: nSVM: Flush L2's ASID when emulating INVLPGA Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  1:30   ` sashiko-bot
  2026-06-16  0:41 ` [RFC PATCH v2 25/25] DO NOT MERGE: Add nested_tlb_force_flush Yosry Ahmed
  24 siblings, 1 reply; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

Now that TLB flushes are properly handled and tracked for L1 vs L2
ASIDs, allocate a separate new ASID for L2 for each vCPU, similar to how
VMX handles VPIDs. Drop the unconditional flushes and syncs on nested
transitions.

On a Turin CPU, this results in 8-15% performance boost in CPUID rate
microbenchmark [1] and netperf TCP_RR latency/throughput.

[1]https://lore.kernel.org/kvm/20231109180646.2963718-1-khorenko@virtuozzo.com/

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index a226aca8f9108..56b769b603ca9 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -718,10 +718,6 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
 		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
 	}
-
-	/* TODO: optimize unconditional TLB flush/MMU sync */
-	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
-	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
 
 static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
@@ -733,9 +729,6 @@ static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
 	/* Flush L1's own ASID if it request a *full* TLB flush on VMRUN */
 	if (svm->nested.ctl.tlb_ctl == TLB_CONTROL_FLUSH_ALL_ASID)
 		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
-
-	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
-	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
 
 static void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb)
@@ -1539,7 +1532,7 @@ int svm_allocate_nested(struct vcpu_svm *svm)
 	if (!svm->nested.msrpm)
 		goto err_free_vmcb02;
 
-	svm->nested.asid02 = svm->asid;
+	svm->nested.asid02 = allocate_asid();
 
 	svm->nested.initialized = true;
 	return 0;
@@ -1557,6 +1550,8 @@ void svm_free_nested(struct vcpu_svm *svm)
 	if (WARN_ON_ONCE(svm->vmcb != svm->vmcb01.ptr))
 		svm_switch_vmcb(svm, &svm->vmcb01);
 
+	free_asid(svm->nested.asid02);
+
 	svm_vcpu_free_msrpm(svm->nested.msrpm);
 	svm->nested.msrpm = NULL;
 
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH v2 25/25] DO NOT MERGE: Add nested_tlb_force_flush
  2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
                   ` (23 preceding siblings ...)
  2026-06-16  0:41 ` [RFC PATCH v2 24/25] KVM: nSVM: Use different ASIDs for L1 and L2 Yosry Ahmed
@ 2026-06-16  0:41 ` Yosry Ahmed
  2026-06-16  1:21   ` sashiko-bot
  24 siblings, 1 reply; 32+ messages in thread
From: Yosry Ahmed @ 2026-06-16  0:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Tom Lendacky, kvm, linux-kernel, Yosry Ahmed

Purely for testing only, add a knob that brings back unconditional force
flushes (and resync) on nested transitions.

Signed-off-by: Yosry Ahmed <yosry@kernel.org>
---
 arch/x86/kvm/svm/nested.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 56b769b603ca9..210240a0e051a 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -32,6 +32,9 @@
 #include "hyperv.h"
 #include "pmu.h"
 
+static bool nested_tlb_force_flush;
+module_param(nested_tlb_force_flush, bool, 0644);
+
 #define CC KVM_NESTED_VMENTER_CONSISTENCY_CHECK
 
 static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
@@ -691,6 +694,12 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 	/* Handle pending Hyper-V TLB flush requests */
 	kvm_hv_nested_transtion_tlb_flush(vcpu, npt_enabled);
 
+	if (nested_tlb_force_flush) {
+		kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
+		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+		return;
+	}
+
 	if (svm->nested.ctl.asid != svm->nested.last_asid) {
 		svm->nested.last_asid = svm->nested.ctl.asid;
 		new_asid = true;
@@ -726,6 +735,12 @@ static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
 
 	kvm_hv_nested_transtion_tlb_flush(vcpu, npt_enabled);
 
+	if (nested_tlb_force_flush) {
+		kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
+		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+		return;
+	}
+
 	/* Flush L1's own ASID if it request a *full* TLB flush on VMRUN */
 	if (svm->nested.ctl.tlb_ctl == TLB_CONTROL_FLUSH_ALL_ASID)
 		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
-- 
2.54.0.1136.gdb2ca164c4-goog


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 13/25] KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH
  2026-06-16  0:41 ` [RFC PATCH v2 13/25] KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH Yosry Ahmed
@ 2026-06-16  1:05   ` sashiko-bot
  0 siblings, 0 replies; 32+ messages in thread
From: sashiko-bot @ 2026-06-16  1:05 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Dereferencing `svm->nested.vmcb02.ptr` without verifying `svm->nested.initialized` causes a Use-After-Free write if `svm_allocate_nested()` previously failed.
--

commit 88a69b7a032d088ddb452c91fba161d26dbed0ec
Author: Yosry Ahmed <yosry@kernel.org>

KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH

This commit modifies svm_flush_tlb_all() to flush both L1 and L2 ASIDs
to properly handle KVM_REQ_TLB_FLUSH across all contexts. It also purges
all Hyper-V TLB FIFOs for both L1 and L2.

> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index dac3d0098bb77..4f13c3456a5d7 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
[ ... ]
> @@ -4213,7 +4215,11 @@ static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
>  	if (WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
>  		hv_flush_remote_tlbs(vcpu->kvm);
>  
> -	svm_flush_tlb_asid(vcpu);
> +	kvm_hv_purge_all_tlb_flush_fifos(vcpu);
> +
> +	vmcb_set_flush_asid(svm->vmcb01.ptr);
> +	if (svm->nested.vmcb02.ptr)
> +		vmcb_set_flush_asid(svm->nested.vmcb02.ptr);

[Severity: High]
Could this dereference of svm->nested.vmcb02.ptr cause a use-after-free?

If svm_allocate_nested() successfully allocates vmcb02_page and assigns it
to svm->nested.vmcb02.ptr, but a subsequent allocation (such as msrpm)
fails, the error path frees the page but does not clear the pointer:

arch/x86/kvm/svm/nested.c:svm_allocate_nested() {
    ...
err_free_vmcb02:
    __free_page(vmcb02_page);
    return -ENOMEM;
}

If a TLB flush is later triggered and calls svm_flush_tlb_all(), the check
for if (svm->nested.vmcb02.ptr) would evaluate to true on the dangling
pointer, and vmcb_set_flush_asid() would write to freed memory.

Would it be safer to check svm->nested.initialized here instead, to ensure
the nested state is fully allocated and valid?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260616004155.1435766-1-yosry@kernel.org?part=13

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 08/25] KVM: SEV: Stop using per-vCPU ASID for SEV VMs
  2026-06-16  0:41 ` [RFC PATCH v2 08/25] KVM: SEV: Stop using per-vCPU ASID for SEV VMs Yosry Ahmed
@ 2026-06-16  1:06   ` sashiko-bot
  0 siblings, 0 replies; 32+ messages in thread
From: sashiko-bot @ 2026-06-16  1:06 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] A malicious SEV guest or an unprivileged host-userspace process can trigger a host WARN_ON_ONCE by executing VMRUN (entering nested virtualization) or via KVM_SET_NESTED_STATE.
--

commit c3ac1dbd3331e2991b6bd808b207ce3395a73130
Author: Yosry Ahmed <yosry@kernel.org>

KVM: SEV: Stop using per-vCPU ASID for SEV VMs

This commit updates KVM to initialize the SEV VM ASID once in
sev_init_vmcb() instead of checking and setting it on every vCPU run. It
replaces the run-time assignment check in pre_sev_run() with a warning.

> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3555,10 +3555,7 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
>  	if (!cpumask_test_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus))
>  		cpumask_set_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus);
>  
> -	/* Assign the asid allocated with this SEV guest */
> -	svm->asid = asid;
> -
> -	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
> +	if (WARN_ON_ONCE(asid != svm->vmcb->control.asid)) {

[Severity: High]
Can a malicious SEV guest or an unprivileged host-userspace process trigger
this warning by entering nested virtualization or via KVM_SET_NESTED_STATE?

KVM does not strictly prohibit userspace from exposing CPUID.SVM to an SEV
guest. If an SEV guest sets EFER.SVME and executes VMRUN, KVM intercepts it
and handles the nested transition via nested_svm_vmrun(), which switches the
active VMCB pointer to the nested vmcb02.

The nested vmcb02 does not inherit the SEV ASID assignment from
sev_init_vmcb(), and its control.asid is initialized separately. During the
subsequent svm_vcpu_run() -> pre_sev_run() call, the VM's valid SEV ASID will
no longer match the nested vmcb02's ASID, causing this warning to trigger.

Since this condition is guest-reachable and controllable by userspace, does
this create a host denial of service vector when panic_on_warn is set?

>  		svm->vmcb->control.asid = asid;
>  		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
>  	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260616004155.1435766-1-yosry@kernel.org?part=8

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 09/25] KVM: SVM: Use a static ASID per vCPU
  2026-06-16  0:41 ` [RFC PATCH v2 09/25] KVM: SVM: Use a static ASID per vCPU Yosry Ahmed
@ 2026-06-16  1:08   ` sashiko-bot
  0 siblings, 0 replies; 32+ messages in thread
From: sashiko-bot @ 2026-06-16  1:08 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] KVM fails to load on systems where all guest ASIDs are assigned to SEV (`nr_asids == max_sev_asid + 1`), breaking SEV-only environments.
--

commit 6e1ef9a91f738606e7d8f9e36c1659d383c52061
Author: Yosry Ahmed <yosry@kernel.org>

KVM: SVM: Use a static ASID per vCPU

This patch switches KVM SVM from dynamic ASID allocation to a static
per-vCPU ASID to avoid full TLB flushes and simplify the code. It introduces
a fallback ASID logic and relies on TLB tag tracking for the pool of
available ASIDs.

> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 5d4c45d788b54..fae5cb7102010 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c

[ ... ]

> @@ -5751,6 +5744,17 @@ static __init int svm_hardware_setup(void)
>  
>  	kvm_caps.inapplicable_quirks &= ~KVM_X86_QUIRK_CD_NW_CLEARED;
>  
> +	/* Consumes max_sev_asid initialized by sev_hardware_setup() */
> +	min_asid = max_sev_asid + 1;
> +	nr_asids = cpuid_ebx(SVM_CPUID_FUNC);
> +	r = kvm_init_tlb_tags(min_asid, nr_asids - min_asid);

[Severity: High]
Will this cause KVM to fail loading on systems where the BIOS configures all
available guest ASIDs for SEV?

If max_sev_asid equals nr_asids - 1 (a common configuration for cloud
servers dedicated to Confidential VMs), then min_asid will equal nr_asids.
This makes nr_asids - min_asid evaluate to 0.

When kvm_init_tlb_tags() is called with a count of 0, it returns -EINVAL,
which causes svm_hardware_setup() to fail and prevents the KVM module from
loading entirely. Prior to this patch, KVM would successfully load in this
configuration and allow SEV guests to run normally.

> +	if (r)
> +		goto err;
> +
> +	fallback_asid = kvm_alloc_tlb_tag();
> +	if (!fallback_asid)
> +		goto err;
> +
>  	for_each_possible_cpu(cpu) {
>  		r = svm_cpu_init(cpu);
>  		if (r)

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260616004155.1435766-1-yosry@kernel.org?part=9

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 17/25] KVM: nSVM: Service local TLB flushes before nested transitions
  2026-06-16  0:41 ` [RFC PATCH v2 17/25] KVM: nSVM: Service local TLB flushes before nested transitions Yosry Ahmed
@ 2026-06-16  1:20   ` sashiko-bot
  0 siblings, 0 replies; 32+ messages in thread
From: sashiko-bot @ 2026-06-16  1:20 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Context mismatch in `svm_switch_vmcb()` incorrectly applies global TLB flushes to the outgoing VMCB and clears the request bit, causing the incoming VMCB to miss the flush. Furthermore, on nested VM-Exit, `leave_guest_mode()` is called before `svm_switch_vmcb()`, causing `kvm_hv_purge_tlb_flush_fifo()` to incorrectly reset L1's Hyper-V TLB flush FIFO while hardware flushes are applied to L2's `vmcb02`.
--

commit 5fb35fcca4bb6e1756cc67fdc65ffe16ab6c194f
Author: Yosry Ahmed <yosry@kernel.org>

KVM: nSVM: Service local TLB flushes before nested transitions

This patch updates nSVM to service local TLB flush requests before
switching VMCBs during nested transitions. It aims to ensure flushes are
applied to the correct context, similar to how nVMX handles them.

> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 234724d8b4c54..7b19191e0e43f 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -717,9 +717,16 @@ static void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_v
>  {
>  	struct kvm_vcpu *vcpu = &svm->vcpu;
>  
> +	/*
> +	 * TLB flushes are applied to the VMCB, so apply any pending TLB flushes
> +	 * on the current VMCB before switching to a new one..
> +	 */
> +	kvm_service_local_tlb_flush_requests(vcpu);

[Severity: High]
Does this clear global TLB flush requests (like KVM_REQ_TLB_FLUSH_GUEST)
while applying them only to the outgoing VMCB?

If a host-initiated flush occurs (e.g., from page reclaim), the request
bit will be consumed and applied to the outgoing VMCB, leaving the incoming
VMCB to miss the flush and run with a stale TLB.

Also, is there a context mismatch here during nested VM-Exit?

In nested_svm_vmexit(), leave_guest_mode(vcpu) is called before
svm_switch_vmcb(svm, &svm->vmcb01). 

Because of this, kvm_service_local_tlb_flush_requests() will see
is_guest_mode(vcpu) as false. If Hyper-V is enabled, this means
kvm_hv_purge_tlb_flush_fifo() will incorrectly target L1's FIFO while the
hardware flushes are actually being applied to L2's vmcb02.

> +
>  	svm->current_vmcb = target_vmcb;
>  	svm->vmcb = target_vmcb->ptr;

[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260616004155.1435766-1-yosry@kernel.org?part=17

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 25/25] DO NOT MERGE: Add nested_tlb_force_flush
  2026-06-16  0:41 ` [RFC PATCH v2 25/25] DO NOT MERGE: Add nested_tlb_force_flush Yosry Ahmed
@ 2026-06-16  1:21   ` sashiko-bot
  0 siblings, 0 replies; 32+ messages in thread
From: sashiko-bot @ 2026-06-16  1:21 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] State Desynchronization (ASID tracking) / Isolation Breach
- [High] Weakened Flush Semantics / Stale MMU Roots
- [Medium] Guideline Violation / Mutable State
--

commit 3fa5ceb50c23aa3458e6cdaf0dc109608a8b6192
Author: Yosry Ahmed <yosry@kernel.org>

DO NOT MERGE: Add nested_tlb_force_flush

Purely for testing only, add a knob that brings back unconditional force
flushes (and resync) on nested transitions.

> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -32,6 +32,9 @@
>  #include "hyperv.h"
>  #include "pmu.h"
>  
> +static bool nested_tlb_force_flush;
> +module_param(nested_tlb_force_flush, bool, 0644);

[Severity: Medium]
Does this module parameter need to be mutable at runtime (0644)?

KVM guidelines suggest new flags should default to immutable (0444) because
toggling them creates unexpected state-machine transitions that can be
difficult to handle safely.

> [ ... ]
> @@ -691,6 +694,12 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
>  	/* Handle pending Hyper-V TLB flush requests */
>  	kvm_hv_nested_transtion_tlb_flush(vcpu, npt_enabled);
>  
> +	if (nested_tlb_force_flush) {
> +		kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> +		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> +		return;
> +	}

[Severity: Critical]
If this parameter is toggled off at runtime, can it leave
svm->nested.last_asid out of sync?

Because the early return skips updating last_asid, if the feature is later
disabled, KVM might see the stale last_asid matching a newly assigned L1
ASID:

nested_svm_entry_tlb_flush() {
    ...
    if (svm->nested.ctl.asid != svm->nested.last_asid) {
        svm->nested.last_asid = svm->nested.ctl.asid;
        new_asid = true;
    }
    ...
}

Would this incorrectly evaluate new_asid to false and skip the required
flush, allowing the L2 guest to run with mappings from a different context?

[Severity: High]
Does returning early here bypass the full guest flush guarantees?

When replacing KVM_REQ_TLB_FLUSH_GUEST with KVM_REQ_MMU_SYNC and
KVM_REQ_TLB_FLUSH_CURRENT, it appears to skip synchronizing the previous
mmu roots (via kvm_mmu_sync_prev_roots) and guest-specific flush logic.

If L1 explicitly requests a full TLB flush, could a guest switching CR3
to a cached previous root end up using stale shadow page tables?

>  	if (svm->nested.ctl.asid != svm->nested.last_asid) {
>  		svm->nested.last_asid = svm->nested.ctl.asid;
>  		new_asid = true;
> [ ... ]
> @@ -726,6 +735,12 @@ static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
>  
>  	kvm_hv_nested_transtion_tlb_flush(vcpu, npt_enabled);
>  
> +	if (nested_tlb_force_flush) {
> +		kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
> +		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
> +		return;
> +	}

[Severity: High]
Could this early return on VM-Exit similarly replace an explicit
TLB_CONTROL_FLUSH_ALL_ASID request with a weaker current-only flush?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260616004155.1435766-1-yosry@kernel.org?part=25

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH v2 24/25] KVM: nSVM: Use different ASIDs for L1 and L2
  2026-06-16  0:41 ` [RFC PATCH v2 24/25] KVM: nSVM: Use different ASIDs for L1 and L2 Yosry Ahmed
@ 2026-06-16  1:30   ` sashiko-bot
  0 siblings, 0 replies; 32+ messages in thread
From: sashiko-bot @ 2026-06-16  1:30 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: kvm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:

New issues:
- [Critical] Failure to reset `svm->nested.last_asid` during nested SVM teardown allows L2 guests to run with stale TLB entries from recycled ASIDs.

Pre-existing issues:
- [Critical] Dangling pointer `svm->nested.vmcb02.ptr` leaked on the `svm_allocate_nested()` error path leads to a Use-After-Free write during TLB flushes.
--

commit 0fdb61625f72b8737a6f7fe184ab78e2c15d5576
Author: Yosry Ahmed <yosry@kernel.org>
KVM: nSVM: Use different ASIDs for L1 and L2

This patch modifies nested SVM to allocate a separate hardware ASID for L2
for each vCPU. It drops the unconditional TLB flushes and MMU syncs on nested
transitions, replacing them with proper tracking for L1 vs L2 ASIDs.

Link: https://lore.kernel.org/kvm/20231109180646.2963718-1-khorenko@virtuozzo.com/

> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index a226aca8f9108..56b769b603ca9 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c

[ ... ]

> @@ -1539,7 +1532,7 @@ int svm_allocate_nested(struct vcpu_svm *svm)
>  	if (!svm->nested.msrpm)
>  		goto err_free_vmcb02;
>  
> -	svm->nested.asid02 = svm->asid;
> +	svm->nested.asid02 = allocate_asid();
>  
>  	svm->nested.initialized = true;
>  	return 0;
>  
>  err_free_vmcb02:
>  	__free_page(vmcb02_page);
>  	return -ENOMEM;
>  }

[Severity: Critical]
This isn't a bug introduced by this patch, but does the error path here leave
a dangling pointer in svm->nested.vmcb02.ptr?

When msrpm allocation fails, the vmcb02_page is freed, but
svm->nested.vmcb02.ptr is not explicitly cleared. If a subsequent TLB flush
occurs on this vCPU, for example via MMU notifiers:

svm_flush_tlb_all() {
    ...
    if (svm->nested.vmcb02.ptr)
        vmcb_set_flush_asid(svm->nested.vmcb02.ptr);
}

Could this result in a use-after-free write that corrupts the freed memory?

[ ... ]

> @@ -1557,6 +1550,8 @@ void svm_free_nested(struct vcpu_svm *svm)
>  	if (WARN_ON_ONCE(svm->vmcb != svm->vmcb01.ptr))
>  		svm_switch_vmcb(svm, &svm->vmcb01);
>  
> +	free_asid(svm->nested.asid02);
> +

[Severity: Critical]
When the hardware ASID is freed here during nested SVM teardown, does
svm->nested.last_asid also need to be reset to 0?

If L1 disables SVM, svm_free_nested() is called, freeing asid02. This ASID tag
can then be reallocated. If L1 later re-enables SVM and uses the same L1-managed
ASID, nested_svm_entry_tlb_flush() will see:

nested_svm_entry_tlb_flush() {
    ...
    if (svm->nested.ctl.asid != svm->nested.last_asid) {
        svm->nested.last_asid = svm->nested.ctl.asid;
        new_asid = true;
    }
    ...
}

This check would fail to set new_asid = true. Since this patch removes the
unconditional TLB flush on nested entry, would this regression allow the L2
guest to run with stale TLB entries from the previous owner of the reallocated
hardware ASID?

>  	svm_vcpu_free_msrpm(svm->nested.msrpm);
>  	svm->nested.msrpm = NULL;
>

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260616004155.1435766-1-yosry@kernel.org?part=24

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2026-06-16  1:30 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-16  0:41 [RFC PATCH v2 00/25] Optimize nSVM TLB flushes Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 01/25] KVM: nSVM: Flush the TLB after forcefully leaving nested Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 02/25] KVM: SVM: Passthrough the number of supported ASIDs Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 03/25] KVM: VMX: Generalize VPID allocation to be vendor-neutral Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 04/25] KVM: x86/mmu: Support specifying a minimum TLB tag Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 05/25] KVM: SVM: Add helpers to set/clear ASID flush in VMCB Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 06/25] KVM: SVM: Fallback to flush everything if FLUSHBYASID is not available Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 07/25] KVM: SVM: Duplicate pre-run ASID check for SEV and non-SEV guests Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 08/25] KVM: SEV: Stop using per-vCPU ASID for SEV VMs Yosry Ahmed
2026-06-16  1:06   ` sashiko-bot
2026-06-16  0:41 ` [RFC PATCH v2 09/25] KVM: SVM: Use a static ASID per vCPU Yosry Ahmed
2026-06-16  1:08   ` sashiko-bot
2026-06-16  0:41 ` [RFC PATCH v2 10/25] KVM: nSVM: Add a placeholder ASID for L2 Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 11/25] KVM: x86: hyper-v: Rename kvm_hv_vcpu_purge_flush_tlb() Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 12/25] KVM: x86: hyper-v: Allow puring all TLB flush FIFOs Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 13/25] KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH Yosry Ahmed
2026-06-16  1:05   ` sashiko-bot
2026-06-16  0:41 ` [RFC PATCH v2 14/25] KVM: nSVM: Move svm_switch_vmcb() to nested.c Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 15/25] KVM: nSVM: Call nested_svm_transition_tlb_flush() on every VMCB switch Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 16/25] KVM: nSVM: Split nested_svm_transition_tlb_flush() into entry/exit fns Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 17/25] KVM: nSVM: Service local TLB flushes before nested transitions Yosry Ahmed
2026-06-16  1:20   ` sashiko-bot
2026-06-16  0:41 ` [RFC PATCH v2 18/25] KVM: nSVM: Handle nested TLB flush requests through TLB_CONTROL Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 19/25] KVM: nSVM: Flush the TLB if L1 changes L2's ASID in vmcb12 Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 20/25] KVM: nSVM: Do not reset TLB_CONTROL in vmcb02 on nested VM-Enter Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 21/25] KVM: x86/mmu: rename __kvm_mmu_invalidate_addr() Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 22/25] KVM: x86/mmu: Refactor kvm_mmu_invlpg() to allow skipping the gva flush Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 23/25] KVM: nSVM: Flush L2's ASID when emulating INVLPGA Yosry Ahmed
2026-06-16  0:41 ` [RFC PATCH v2 24/25] KVM: nSVM: Use different ASIDs for L1 and L2 Yosry Ahmed
2026-06-16  1:30   ` sashiko-bot
2026-06-16  0:41 ` [RFC PATCH v2 25/25] DO NOT MERGE: Add nested_tlb_force_flush Yosry Ahmed
2026-06-16  1:21   ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.