[RFC PATCH 00/24] KVM: SVM: Rework ASID management

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 00/24] KVM: SVM: Rework ASID management
@ 2025-03-26 19:35 Yosry Ahmed
  2025-03-26 19:35 ` [RFC PATCH 01/24] KVM: VMX: Generalize VPID allocation to be vendor-neutral Yosry Ahmed
                   ` (19 more replies)
  0 siblings, 20 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

This series reworks how SVM manages ASIDs by:
(a) Allocating a single static ASID for each L1 VM, instead of
    dynamically allocating ASIDs. This simplifies the logic and allow
    for more unifications between SVM and SEV, as the latter already
    uses per-VM ASIDs as required for other purposes.

    This is patches 1 to 10.

(b) Using a separate ASID for L2 VMs. Instead of using the same ASID for
    L1 and L2 guests, and doing a TLB flush and MMU sync on every nested
    transition, a separate ASID is used and TLB flushes are done
    conditionally as needed.

    This is patches 11 till the end.

The advantages of this are:
- Simplifying the logic by dropping dynamic ASID allocations.
- Unifying some logic between SVM and SEV, as the latter already uses
  per-VM ASIDs as required for other purposes.
- Enabling INVLPGB virtualization [1].
- Improving the performance of nested guests by avoiding some TLB
  flushes.

The series was tested by running a L2 and L3 Linux guests with some
simple workloads in them (mmap()/munmap() stress, netperf, etc). I also
ran the KVM selftests in both L0 and L1.

I believe some of the patches are in mergeable state, but this series is
still an RFC for a few reasons:
- I haven't done as much testing as I initially planned. Mainly I wanted
  to test with a Windows guest running WSL to get Linux and Windows L2
  VMs running side-by-side. I couldn't get it done due to some
  testing infrastructure hiccups.

- The SEV changes are generally untested beyond build testing, and I
  would like to get more feedback on them before moving forward. Namely,
  I think there is room for further unification. SEV should probably use
  the new kvm_tlb_tags infrastructure to allocate its ASIDs as well. The
  way I think about it is by optionally having a bitmap of "pending"
  ASIDs in kvm_tlb_tags, and make unused SEV ASIDs "pending" until we
  run out of space and do the necessary flushes to make them free.

- I want to get general feedback about the direction this is heading in,
  and things like generalizing the ASID tracking in SEV to work for SVM,
  thoughts on using an xarray for that, etc.

- Some things can/should be cleaned up, although they can be followups
  too. For example, the current logic will allocate a "normal" ASID for
  an SEV VM upon creation, then allocate an SEV-friendly ASID to it when
  SEV is initialized. The "normal" ASID remains allocated though, and
  kvm_svm->asid and kvm_svm->sev_info.asid remain different. It seems
  like we should not allocate the "normal" ASID to begin with, or free
  it if the VM uses SEV. However, I am not sure what's the best way to
  do any of this because I am not clear on the life cycle of a SEV VM.

This series started as two separate series, one to optimize nested TLB
flushes by using a separate ASID for L2 VMs [2], and one to use a single
ASID per-VM [3]. However, there is a lot of dependency and interaction
among both series that I think it's useful to combine them, at least for
now so that the big picture is clear. The series can be later split
again into 2 or more series, or merged incrementally.

I am sending this out now to get feedback, and also to "checkpoint" my
work as I won't be picking this up again for a few months. I will remain
able to respond to discussion and reviews, although at a lower capacity.
If anyone wants to pick up this series in the meantime, partially or
fully, please feel free to do so. Just let me know so that we can
coordinate.

Rik and Tom, I CC'd you due to the previous discussion you had with Sean
about INVLPGB virtualization. I can drop you from following versions if
you'd like to avoid the noise.

Here is a brief walkthrough of the series:

Part 1: Use a single ASID per-VM
- Patch 1 generalizes the VPID allocation into a generic kvm_tlb_tags
  factory to be used by SVM.
- Patches 2-3 are cleanups and/or refactoring.
- Patches 4-5 get rid of the cases where we currently allocate a new
  ASID dynamically by just flushing the existing ASID or falling back to
  full flush if flushing an ASID is not supported.
- Patches 6-9 generalize SEV's per-CPU ASID -> vCPU tracking to make it
  work for SVM.
- Patch 10 finally drops the dynamic ASID allocation logic and uses a
  single per-VM ASID.

Part 2: Optimize nSVM TLB flushes
- Patch 11 starts by using a separate ASID for L2 guests, although
  it is initially the same as the L1 ASID. It's essentially just laying
  the groundwork.
- Patches 12 - 16 are refactoring groundwork.
- Patches 17 - 22 add the needed handling of the L2 ASID TLB flushing.
- Patch 23 starts allocating a new ASID for L2 as using the same ASID is
  no longer needed.
- Patch 24 drops the unconditional TLB flushes on nested transitions,
  which are no longer necessary after L2 is using a separate
  well-maintained ASID.

Diff from the initial versions of series [2] and [3]:
- Generalized the SEV tracking of ASID->vCPU to use it for SVM, to make
  sure the TLB is flushed when a new vCPU with the same ASID is run on
  the same physical CPU.
- Made sure kvm_hv_vcpu_purge_flush_tlb() is handled correctly by
  passing in is_guest_mode to purge the correct queue when doing L1 vs
  L2 TLB flushes (Maxim).
- Improved the commentary in nested_svm_entry_tlb_flush() (Maxim).
- Handle INVLPGA from the guest even nested NPT is used (Maxim).
- Improved some commit logs.

[1]https://lore.kernel.org/all/Z8HdBg3wj8M7a4ts@google.com/
[2]https://lore.kernel.org/lkml/20250205182402.2147495-1-yosry.ahmed@linux.dev/
[3]https://lore.kernel.org/lkml/20250313215540.4171762-1-yosry.ahmed@linux.dev/


Yosry Ahmed (24):
  KVM: VMX: Generalize VPID allocation to be vendor-neutral
  KVM: SVM: Use cached local variable in init_vmcb()
  KVM: SVM: Add helpers to set/clear ASID flush in VMCB
  KVM: SVM: Flush everything if FLUSHBYASID is not available
  KVM: SVM: Flush the ASID when running on a new CPU
  KVM: SEV: Track ASID->vCPU instead of ASID->VMCB
  KVM: SEV: Track ASID->vCPU on vCPU load
  KVM: SEV: Drop pre_sev_run()
  KVM: SEV: Generalize tracking ASID->vCPU with xarrays
  KVM: SVM: Use a single ASID per VM
  KVM: nSVM: Use a separate ASID for nested guests
  KVM: x86: hyper-v: Pass is_guest_mode to kvm_hv_vcpu_purge_flush_tlb()
  KVM: nSVM: Parameterize svm_flush_tlb_asid() by is_guest_mode
  KVM: nSVM: Split nested_svm_transition_tlb_flush() into entry/exit fns
  KVM: x86/mmu: rename __kvm_mmu_invalidate_addr()
  KVM: x86/mmu: Allow skipping the gva flush in
    kvm_mmu_invalidate_addr()
  KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH
  KVM: nSVM: Handle nested TLB flush requests through TLB_CONTROL
  KVM: nSVM: Flush the TLB if L1 changes L2's ASID
  KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry
  KVM: nSVM: Service local TLB flushes before nested transitions
  KVM: nSVM: Handle INVLPGA interception correctly
  KVM: nSVM: Allocate a new ASID for nested guests
  KVM: nSVM: Stop bombing the TLB on nested transitions

 arch/x86/include/asm/kvm_host.h |   2 +
 arch/x86/include/asm/svm.h      |   5 -
 arch/x86/kvm/hyperv.h           |   8 +-
 arch/x86/kvm/mmu/mmu.c          |  22 ++-
 arch/x86/kvm/svm/nested.c       |  68 ++++++---
 arch/x86/kvm/svm/sev.c          |  60 +-------
 arch/x86/kvm/svm/svm.c          | 257 +++++++++++++++++++++++---------
 arch/x86/kvm/svm/svm.h          |  43 ++++--
 arch/x86/kvm/vmx/nested.c       |   4 +-
 arch/x86/kvm/vmx/vmx.c          |  38 +----
 arch/x86/kvm/vmx/vmx.h          |   4 +-
 arch/x86/kvm/x86.c              |  60 +++++++-
 arch/x86/kvm/x86.h              |  13 ++
 13 files changed, 378 insertions(+), 206 deletions(-)

-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC PATCH 01/24] KVM: VMX: Generalize VPID allocation to be vendor-neutral
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
@ 2025-03-26 19:35 ` Yosry Ahmed
  2025-03-27 10:58   ` Nikunj A Dadhania
  2025-06-23 16:44   ` Sean Christopherson
  2025-03-26 19:35 ` [RFC PATCH 02/24] KVM: SVM: Use cached local variable in init_vmcb() Yosry Ahmed
                   ` (18 subsequent siblings)
  19 siblings, 2 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

Generalize the VMX VPID allocation code and make move it to common code
in preparation for sharing with SVM. Create a generic struct
kvm_tlb_tags, representing a factory for VPIDs (or ASIDs later), and use
one for VPIDs.

Most of the functionality remains the same, with the following
differences:
- The enable_vpid checks are moved to the callers for allocate_vpid()
  and free_vpid(), as they are specific to VMX.
- The bitmap allocation is now dynamic (which will be required for SVM),
  so it is initialized and cleaned up in vmx_hardware_{setup/unsetup}().
- The range of valid TLB tags is expressed in terms of min/max instead
  of the number of tags to support SVM use cases.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/vmx/nested.c |  4 +--
 arch/x86/kvm/vmx/vmx.c    | 38 +++++--------------------
 arch/x86/kvm/vmx/vmx.h    |  4 +--
 arch/x86/kvm/x86.c        | 58 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.h        | 13 +++++++++
 5 files changed, 82 insertions(+), 35 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d06e50d9c0e79..b017bd2eb2382 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -343,7 +343,7 @@ static void free_nested(struct kvm_vcpu *vcpu)
 	vmx->nested.vmxon = false;
 	vmx->nested.smm.vmxon = false;
 	vmx->nested.vmxon_ptr = INVALID_GPA;
-	free_vpid(vmx->nested.vpid02);
+	kvm_tlb_tags_free(&vmx_vpids, vmx->nested.vpid02);
 	vmx->nested.posted_intr_nv = -1;
 	vmx->nested.current_vmptr = INVALID_GPA;
 	if (enable_shadow_vmcs) {
@@ -5333,7 +5333,7 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu)
 		     HRTIMER_MODE_ABS_PINNED);
 	vmx->nested.preemption_timer.function = vmx_preemption_timer_fn;
 
-	vmx->nested.vpid02 = allocate_vpid();
+	vmx->nested.vpid02 = enable_vpid ? kvm_tlb_tags_alloc(&vmx_vpids) : 0;
 
 	vmx->nested.vmcs02_initialized = false;
 	vmx->nested.vmxon = true;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b70ed72c1783d..f7ce75842fa26 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -496,8 +496,7 @@ DEFINE_PER_CPU(struct vmcs *, current_vmcs);
  */
 static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
 
-static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
-static DEFINE_SPINLOCK(vmx_vpid_lock);
+struct kvm_tlb_tags vmx_vpids;
 
 struct vmcs_config vmcs_config __ro_after_init;
 struct vmx_capability vmx_capability __ro_after_init;
@@ -3972,31 +3971,6 @@ static void seg_setup(int seg)
 	vmcs_write32(sf->ar_bytes, ar);
 }
 
-int allocate_vpid(void)
-{
-	int vpid;
-
-	if (!enable_vpid)
-		return 0;
-	spin_lock(&vmx_vpid_lock);
-	vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
-	if (vpid < VMX_NR_VPIDS)
-		__set_bit(vpid, vmx_vpid_bitmap);
-	else
-		vpid = 0;
-	spin_unlock(&vmx_vpid_lock);
-	return vpid;
-}
-
-void free_vpid(int vpid)
-{
-	if (!enable_vpid || vpid == 0)
-		return;
-	spin_lock(&vmx_vpid_lock);
-	__clear_bit(vpid, vmx_vpid_bitmap);
-	spin_unlock(&vmx_vpid_lock);
-}
-
 static void vmx_msr_bitmap_l01_changed(struct vcpu_vmx *vmx)
 {
 	/*
@@ -7559,7 +7533,7 @@ void vmx_vcpu_free(struct kvm_vcpu *vcpu)
 
 	if (enable_pml)
 		vmx_destroy_pml_buffer(vmx);
-	free_vpid(vmx->vpid);
+	kvm_tlb_tags_free(&vmx_vpids, vmx->vpid);
 	nested_vmx_free_vcpu(vcpu);
 	free_loaded_vmcs(vmx->loaded_vmcs);
 	free_page((unsigned long)vmx->ve_info);
@@ -7578,7 +7552,7 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 
 	err = -ENOMEM;
 
-	vmx->vpid = allocate_vpid();
+	vmx->vpid = enable_vpid ? kvm_tlb_tags_alloc(&vmx_vpids) : 0;
 
 	/*
 	 * If PML is turned on, failure on enabling PML just results in failure
@@ -7681,7 +7655,7 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 free_pml:
 	vmx_destroy_pml_buffer(vmx);
 free_vpid:
-	free_vpid(vmx->vpid);
+	kvm_tlb_tags_free(&vmx_vpids, vmx->vpid);
 	return err;
 }
 
@@ -8373,6 +8347,7 @@ void vmx_hardware_unsetup(void)
 		nested_vmx_hardware_unsetup();
 
 	free_kvm_area();
+	kvm_tlb_tags_destroy(&vmx_vpids);
 }
 
 void vmx_vm_destroy(struct kvm *kvm)
@@ -8591,7 +8566,8 @@ __init int vmx_hardware_setup(void)
 	kvm_caps.has_bus_lock_exit = cpu_has_vmx_bus_lock_detection();
 	kvm_caps.has_notify_vmexit = cpu_has_notify_vmexit();
 
-	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
+	/* VPID 0 is reserved for host, so min=1  */
+	kvm_tlb_tags_init(&vmx_vpids, 1, VMX_NR_VPIDS - 1);
 
 	if (enable_ept)
 		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 951e44dc9d0ea..9bece3ea63eaa 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -376,10 +376,10 @@ struct kvm_vmx {
 	u64 *pid_table;
 };
 
+extern struct kvm_tlb_tags vmx_vpids;
+
 void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
 			struct loaded_vmcs *buddy);
-int allocate_vpid(void);
-void free_vpid(int vpid);
 void vmx_set_constant_host_state(struct vcpu_vmx *vmx);
 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 69c20a68a3f01..182f18ebc62f3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -13992,6 +13992,64 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
 }
 EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
 
+int kvm_tlb_tags_init(struct kvm_tlb_tags *tlb_tags, unsigned int min,
+		      unsigned int max)
+{
+	/*
+	 * 0 is assumed to be the host's TLB tag and is returned on failed
+	 * allocations.
+	 */
+	if (WARN_ON_ONCE(min == 0))
+		return -1;
+
+	/*
+	 * Allocate enough bits to index the bitmap directly by the tag,
+	 * potentially wasting a bit of memory.
+	 */
+	tlb_tags->bitmap = bitmap_zalloc(max + 1, GFP_KERNEL);
+	if (!tlb_tags->bitmap)
+		return -1;
+
+	tlb_tags->min = min;
+	tlb_tags->max = max;
+	spin_lock_init(&tlb_tags->lock);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_tlb_tags_init);
+
+void kvm_tlb_tags_destroy(struct kvm_tlb_tags *tlb_tags)
+{
+	bitmap_free(tlb_tags->bitmap);
+}
+EXPORT_SYMBOL_GPL(kvm_tlb_tags_destroy);
+
+unsigned int kvm_tlb_tags_alloc(struct kvm_tlb_tags *tlb_tags)
+{
+	unsigned int tag;
+
+	spin_lock(&tlb_tags->lock);
+	tag = find_next_zero_bit(tlb_tags->bitmap, tlb_tags->max + 1,
+				 tlb_tags->min);
+	if (tag <= tlb_tags->max)
+		__set_bit(tag, tlb_tags->bitmap);
+	else
+		tag = 0;
+	spin_unlock(&tlb_tags->lock);
+	return tag;
+}
+EXPORT_SYMBOL_GPL(kvm_tlb_tags_alloc);
+
+void kvm_tlb_tags_free(struct kvm_tlb_tags *tlb_tags, unsigned int tag)
+{
+	if (tag < tlb_tags->min || WARN_ON_ONCE(tag > tlb_tags->max))
+		return;
+
+	spin_lock(&tlb_tags->lock);
+	__clear_bit(tag, tlb_tags->bitmap);
+	spin_unlock(&tlb_tags->lock);
+}
+EXPORT_SYMBOL_GPL(kvm_tlb_tags_free);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 9dc32a4090761..9f84e933d189b 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -652,4 +652,17 @@ int ____kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 
+struct kvm_tlb_tags {
+	spinlock_t	lock;
+	unsigned long	*bitmap;
+	unsigned int	min;
+	unsigned int	max;
+};
+
+int kvm_tlb_tags_init(struct kvm_tlb_tags *tlb_tags, unsigned int min,
+		      unsigned int max);
+void kvm_tlb_tags_destroy(struct kvm_tlb_tags *tlb_tags);
+unsigned int kvm_tlb_tags_alloc(struct kvm_tlb_tags *tlb_tags);
+void kvm_tlb_tags_free(struct kvm_tlb_tags *tlb_tags, unsigned int tag);
+
 #endif
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 02/24] KVM: SVM: Use cached local variable in init_vmcb()
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
  2025-03-26 19:35 ` [RFC PATCH 01/24] KVM: VMX: Generalize VPID allocation to be vendor-neutral Yosry Ahmed
@ 2025-03-26 19:35 ` Yosry Ahmed
  2025-04-03 19:56   ` Maxim Levitsky
  2025-03-26 19:35 ` [RFC PATCH 03/24] KVM: SVM: Add helpers to set/clear ASID flush in VMCB Yosry Ahmed
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

svm->vmcb->control is already cached in the 'control' local variable, so
use that.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/svm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 8abeab91d329d..28a6d2c0f250f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1367,12 +1367,12 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 		avic_init_vmcb(svm, vmcb);
 
 	if (vnmi)
-		svm->vmcb->control.int_ctl |= V_NMI_ENABLE_MASK;
+		control->int_ctl |= V_NMI_ENABLE_MASK;
 
 	if (vgif) {
 		svm_clr_intercept(svm, INTERCEPT_STGI);
 		svm_clr_intercept(svm, INTERCEPT_CLGI);
-		svm->vmcb->control.int_ctl |= V_GIF_ENABLE_MASK;
+		control->int_ctl |= V_GIF_ENABLE_MASK;
 	}
 
 	if (sev_guest(vcpu->kvm))
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 03/24] KVM: SVM: Add helpers to set/clear ASID flush in VMCB
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
  2025-03-26 19:35 ` [RFC PATCH 01/24] KVM: VMX: Generalize VPID allocation to be vendor-neutral Yosry Ahmed
  2025-03-26 19:35 ` [RFC PATCH 02/24] KVM: SVM: Use cached local variable in init_vmcb() Yosry Ahmed
@ 2025-03-26 19:35 ` Yosry Ahmed
  2025-04-03 20:00   ` Maxim Levitsky
  2025-06-23 16:46   ` Sean Christopherson
  2025-03-26 19:35 ` [RFC PATCH 04/24] KVM: SVM: Flush everything if FLUSHBYASID is not available Yosry Ahmed
                   ` (16 subsequent siblings)
  19 siblings, 2 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

Incoming changes will add more code paths that set tlb_ctl to
TLB_CONTROL_FLUSH_ASID, and will eliminate the use of
TLB_CONTROL_FLUSH_ALL_ASID except as fallback when FLUSHBYASID is not
available. Introduce set/clear helpers to set tlb_ctl to
TLB_CONTROL_FLUSH_ASID or TLB_CONTROL_DO_NOTHING.

Opportunistically move the TLB_CONTROL_* definitions to
arch/x86/kvm/svm/svm.h as they are not used outside of arch/x86/kvm/svm/.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/include/asm/svm.h |  5 -----
 arch/x86/kvm/svm/nested.c  |  2 +-
 arch/x86/kvm/svm/sev.c     |  2 +-
 arch/x86/kvm/svm/svm.c     |  4 ++--
 arch/x86/kvm/svm/svm.h     | 15 +++++++++++++++
 5 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 9b7fa99ae9513..a97da63562eb3 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -171,11 +171,6 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 };
 
 
-#define TLB_CONTROL_DO_NOTHING 0
-#define TLB_CONTROL_FLUSH_ALL_ASID 1
-#define TLB_CONTROL_FLUSH_ASID 3
-#define TLB_CONTROL_FLUSH_ASID_LOCAL 7
-
 #define V_TPR_MASK 0x0f
 
 #define V_IRQ_SHIFT 8
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 834b67672d50f..11b02a0340d9e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -681,7 +681,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 	/* Done at vmrun: asid.  */
 
 	/* Also overwritten later if necessary.  */
-	vmcb02->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
+	vmcb_clr_flush_asid(vmcb02);
 
 	/* nested_cr3.  */
 	if (nested_npt_enabled(svm))
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0bc708ee27887..d613f81addf1c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3479,7 +3479,7 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
 		return 0;
 
 	sd->sev_vmcbs[asid] = svm->vmcb;
-	svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
+	vmcb_set_flush_asid(svm->vmcb);
 	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
 	return 0;
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 28a6d2c0f250f..0e302ae9a8435 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4006,7 +4006,7 @@ static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
 	 * VM-Exit (via kvm_mmu_reset_context()).
 	 */
 	if (static_cpu_has(X86_FEATURE_FLUSHBYASID))
-		svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
+		vmcb_set_flush_asid(svm->vmcb);
 	else
 		svm->current_vmcb->asid_generation--;
 }
@@ -4373,7 +4373,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
 		svm->nested.nested_run_pending = 0;
 	}
 
-	svm->vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
+	vmcb_clr_flush_asid(svm->vmcb);
 	vmcb_mark_all_clean(svm->vmcb);
 
 	/* if exit due to PF check for async PF */
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d4490eaed55dd..d2c49cbfbf1ca 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -409,6 +409,21 @@ static inline bool vmcb_is_dirty(struct vmcb *vmcb, int bit)
         return !test_bit(bit, (unsigned long *)&vmcb->control.clean);
 }
 
+#define TLB_CONTROL_DO_NOTHING 0
+#define TLB_CONTROL_FLUSH_ALL_ASID 1
+#define TLB_CONTROL_FLUSH_ASID 3
+#define TLB_CONTROL_FLUSH_ASID_LOCAL 7
+
+static inline void vmcb_set_flush_asid(struct vmcb *vmcb)
+{
+	vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
+}
+
+static inline void vmcb_clr_flush_asid(struct vmcb *vmcb)
+{
+	vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
+}
+
 static __always_inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
 {
 	return container_of(vcpu, struct vcpu_svm, vcpu);
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 04/24] KVM: SVM: Flush everything if FLUSHBYASID is not available
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (2 preceding siblings ...)
  2025-03-26 19:35 ` [RFC PATCH 03/24] KVM: SVM: Add helpers to set/clear ASID flush in VMCB Yosry Ahmed
@ 2025-03-26 19:35 ` Yosry Ahmed
  2025-04-03 20:00   ` Maxim Levitsky
  2025-03-26 19:36 ` [RFC PATCH 05/24] KVM: SVM: Flush the ASID when running on a new CPU Yosry Ahmed
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

Currently, if FLUSHBYASID is not available when performing a TLB flush,
the fallback is decrementing the ASID generation to trigger allocating a
new ASID. In preparation for using a static ASID per VM, just fallback
to flushing everything if FLUSHBYASID is not available. This is probably
worse from a performance perspective, but FLUSHBYASID has been around
for ~15 years and it's not worth carrying the complexity.

The fallback logic is moved within vmcb_set_flush_asid(), as more
callers will be added and will need the fallback as well.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/svm.c | 5 +----
 arch/x86/kvm/svm/svm.h | 5 ++++-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0e302ae9a8435..5f71b125010d9 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4005,10 +4005,7 @@ static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
 	 * unconditionally does a TLB flush on both nested VM-Enter and nested
 	 * VM-Exit (via kvm_mmu_reset_context()).
 	 */
-	if (static_cpu_has(X86_FEATURE_FLUSHBYASID))
-		vmcb_set_flush_asid(svm->vmcb);
-	else
-		svm->current_vmcb->asid_generation--;
+	vmcb_set_flush_asid(svm->vmcb);
 }
 
 static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d2c49cbfbf1ca..843a29a6d150e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -416,7 +416,10 @@ static inline bool vmcb_is_dirty(struct vmcb *vmcb, int bit)
 
 static inline void vmcb_set_flush_asid(struct vmcb *vmcb)
 {
-	vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
+	if (static_cpu_has(X86_FEATURE_FLUSHBYASID))
+		vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
+	else
+		vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ALL_ASID;
 }
 
 static inline void vmcb_clr_flush_asid(struct vmcb *vmcb)
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 05/24] KVM: SVM: Flush the ASID when running on a new CPU
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (3 preceding siblings ...)
  2025-03-26 19:35 ` [RFC PATCH 04/24] KVM: SVM: Flush everything if FLUSHBYASID is not available Yosry Ahmed
@ 2025-03-26 19:36 ` Yosry Ahmed
  2025-04-03 20:00   ` Maxim Levitsky
  2025-03-26 19:36 ` [RFC PATCH 06/24] KVM: SEV: Track ASID->vCPU instead of ASID->VMCB Yosry Ahmed
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

Currently, when a vCPU is migrated to a new physical CPU, the ASID
generation is reset to trigger allocating a new ASID. In preparation for
using a static ASID per VM, just flush the ASID in this case (falling
back to flushing everything if FLUSBYASID is not available).

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/svm.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 5f71b125010d9..18bfc3d3f9ba1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3626,12 +3626,12 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 
 	/*
-	 * If the previous vmrun of the vmcb occurred on a different physical
-	 * cpu, then mark the vmcb dirty and assign a new asid.  Hardware's
-	 * vmcb clean bits are per logical CPU, as are KVM's asid assignments.
+	 * If the previous VMRUN of the VMCB occurred on a different physical
+	 * CPU, then mark the VMCB dirty and flush the ASID.  Hardware's
+	 * VMCB clean bits are per logical CPU, as are KVM's ASID assignments.
 	 */
 	if (unlikely(svm->current_vmcb->cpu != vcpu->cpu)) {
-		svm->current_vmcb->asid_generation = 0;
+		vmcb_set_flush_asid(svm->vmcb);
 		vmcb_mark_all_dirty(svm->vmcb);
 		svm->current_vmcb->cpu = vcpu->cpu;
         }
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 06/24] KVM: SEV: Track ASID->vCPU instead of ASID->VMCB
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (4 preceding siblings ...)
  2025-03-26 19:36 ` [RFC PATCH 05/24] KVM: SVM: Flush the ASID when running on a new CPU Yosry Ahmed
@ 2025-03-26 19:36 ` Yosry Ahmed
  2025-04-03 20:04   ` Maxim Levitsky
  2025-06-20 23:13   ` Sean Christopherson
  2025-03-26 19:36 ` [RFC PATCH 07/24] KVM: SEV: Track ASID->vCPU on vCPU load Yosry Ahmed
                   ` (13 subsequent siblings)
  19 siblings, 2 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

SEV currently tracks the ASID to VMCB mapping for each physical CPU.
This is required to flush the ASID when a new VMCB using the same ASID
is run on the same CPU. Practically, there is a single VMCB for each
vCPU using SEV. Furthermore, TLB flushes on nested transitions between
VMCB01 and VMCB02 are handled separately (see
nested_svm_transition_tlb_flush()).

In preparation for generalizing the tracking and making the tracking
more expensive, start tracking the ASID to vCPU mapping instead. This
will allow for the tracking to be moved to a cheaper code path when
vCPUs are switched.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/sev.c | 12 ++++++------
 arch/x86/kvm/svm/svm.c |  2 +-
 arch/x86/kvm/svm/svm.h |  4 ++--
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d613f81addf1c..ddb4d5b211ed7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -240,7 +240,7 @@ static void sev_asid_free(struct kvm_sev_info *sev)
 
 	for_each_possible_cpu(cpu) {
 		sd = per_cpu_ptr(&svm_data, cpu);
-		sd->sev_vmcbs[sev->asid] = NULL;
+		sd->sev_vcpus[sev->asid] = NULL;
 	}
 
 	mutex_unlock(&sev_bitmap_lock);
@@ -3081,8 +3081,8 @@ int sev_cpu_init(struct svm_cpu_data *sd)
 	if (!sev_enabled)
 		return 0;
 
-	sd->sev_vmcbs = kcalloc(nr_asids, sizeof(void *), GFP_KERNEL);
-	if (!sd->sev_vmcbs)
+	sd->sev_vcpus = kcalloc(nr_asids, sizeof(void *), GFP_KERNEL);
+	if (!sd->sev_vcpus)
 		return -ENOMEM;
 
 	return 0;
@@ -3471,14 +3471,14 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
 	/*
 	 * Flush guest TLB:
 	 *
-	 * 1) when different VMCB for the same ASID is to be run on the same host CPU.
+	 * 1) when different vCPU for the same ASID is to be run on the same host CPU.
 	 * 2) or this VMCB was executed on different host CPU in previous VMRUNs.
 	 */
-	if (sd->sev_vmcbs[asid] == svm->vmcb &&
+	if (sd->sev_vcpus[asid] == &svm->vcpu &&
 	    svm->vcpu.arch.last_vmentry_cpu == cpu)
 		return 0;
 
-	sd->sev_vmcbs[asid] = svm->vmcb;
+	sd->sev_vcpus[asid] = &svm->vcpu;
 	vmcb_set_flush_asid(svm->vmcb);
 	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
 	return 0;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 18bfc3d3f9ba1..1156ca97fd798 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -694,7 +694,7 @@ static void svm_cpu_uninit(int cpu)
 	if (!sd->save_area)
 		return;
 
-	kfree(sd->sev_vmcbs);
+	kfree(sd->sev_vcpus);
 	__free_page(__sme_pa_to_page(sd->save_area_pa));
 	sd->save_area_pa = 0;
 	sd->save_area = NULL;
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 843a29a6d150e..4ea6c61c3b048 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -340,8 +340,8 @@ struct svm_cpu_data {
 
 	struct vmcb *current_vmcb;
 
-	/* index = sev_asid, value = vmcb pointer */
-	struct vmcb **sev_vmcbs;
+	/* index = sev_asid, value = vcpu pointer */
+	struct kvm_vcpu **sev_vcpus;
 };
 
 DECLARE_PER_CPU(struct svm_cpu_data, svm_data);
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 07/24] KVM: SEV: Track ASID->vCPU on vCPU load
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (5 preceding siblings ...)
  2025-03-26 19:36 ` [RFC PATCH 06/24] KVM: SEV: Track ASID->vCPU instead of ASID->VMCB Yosry Ahmed
@ 2025-03-26 19:36 ` Yosry Ahmed
  2025-04-03 20:04   ` Maxim Levitsky
  2025-03-26 19:36 ` [RFC PATCH 08/24] KVM: SEV: Drop pre_sev_run() Yosry Ahmed
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

Check for changes in the ASID to vCPU mapping on vCPU load instead of
doing it on vCPU run. This should be sufficient and more efficient, and
is needed to allow generalizing the tracking and making it more
expensive.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/sev.c | 13 ++++---------
 arch/x86/kvm/svm/svm.c | 13 +++++++++++++
 arch/x86/kvm/svm/svm.h |  1 +
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ddb4d5b211ed7..3ef0dfdbb34d2 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -224,7 +224,7 @@ static int sev_asid_new(struct kvm_sev_info *sev)
 	return ret;
 }
 
-static unsigned int sev_get_asid(struct kvm *kvm)
+unsigned int sev_get_asid(struct kvm *kvm)
 {
 	return to_kvm_sev_info(kvm)->asid;
 }
@@ -3453,7 +3453,6 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm)
 
 int pre_sev_run(struct vcpu_svm *svm, int cpu)
 {
-	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu);
 	struct kvm *kvm = svm->vcpu.kvm;
 	unsigned int asid = sev_get_asid(kvm);
 
@@ -3469,16 +3468,12 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
 	svm->asid = asid;
 
 	/*
-	 * Flush guest TLB:
-	 *
-	 * 1) when different vCPU for the same ASID is to be run on the same host CPU.
-	 * 2) or this VMCB was executed on different host CPU in previous VMRUNs.
+	 * Flush guest TLB if the VMCB was executed on a differet host CPU in
+	 * previous VMRUNs.
 	 */
-	if (sd->sev_vcpus[asid] == &svm->vcpu &&
-	    svm->vcpu.arch.last_vmentry_cpu == cpu)
+	if (svm->vcpu.arch.last_vmentry_cpu == cpu)
 		return 0;
 
-	sd->sev_vcpus[asid] = &svm->vcpu;
 	vmcb_set_flush_asid(svm->vmcb);
 	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
 	return 0;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1156ca97fd798..e6e380411fbec 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1554,6 +1554,7 @@ static void svm_prepare_host_switch(struct kvm_vcpu *vcpu)
 
 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
+	unsigned int asid;
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu);
 
@@ -1568,6 +1569,18 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	}
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_vcpu_load(vcpu, cpu);
+
+	if (sev_guest(vcpu->kvm)) {
+		/*
+		 * Flush the TLB when a different vCPU using the same ASID is
+		 * run on the same CPU.
+		 */
+		asid = sev_get_asid(vcpu->kvm);
+		if (sd->sev_vcpus[asid] != vcpu) {
+			sd->sev_vcpus[asid] = vcpu;
+			kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+		}
+	}
 }
 
 static void svm_vcpu_put(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4ea6c61c3b048..ca38a233fa24c 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -768,6 +768,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
+unsigned int sev_get_asid(struct kvm *kvm);
 
 #ifdef CONFIG_KVM_AMD_SEV
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp);
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 08/24] KVM: SEV: Drop pre_sev_run()
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (6 preceding siblings ...)
  2025-03-26 19:36 ` [RFC PATCH 07/24] KVM: SEV: Track ASID->vCPU on vCPU load Yosry Ahmed
@ 2025-03-26 19:36 ` Yosry Ahmed
  2025-04-03 20:04   ` Maxim Levitsky
  2025-03-26 19:36 ` [RFC PATCH 09/24] KVM: SEV: Generalize tracking ASID->vCPU with xarrays Yosry Ahmed
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

Now that the ASID to vCPU/VMCB tracking was moved out of pre_sev_run(),
the only remaining pieces are:
(a) Checking for valid VMSA.
(b) Assigning svm->asid.
(c) Flush the ASID if the VMCB is run on a different physical CPU.

The check in (c) is already being done in pre_svm_run(), and so is
redundant. (a) and (b) are small enough and probably do not warrant a
separate helper (and (b) will be going way soon), so open-code the
function into pre_svm_run() and remove it.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/sev.c | 28 ----------------------------
 arch/x86/kvm/svm/svm.c | 16 ++++++++++++++--
 arch/x86/kvm/svm/svm.h |  1 -
 3 files changed, 14 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 3ef0dfdbb34d2..1742f51d4c194 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3451,34 +3451,6 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm)
 	svm->sev_es.ghcb = NULL;
 }
 
-int pre_sev_run(struct vcpu_svm *svm, int cpu)
-{
-	struct kvm *kvm = svm->vcpu.kvm;
-	unsigned int asid = sev_get_asid(kvm);
-
-	/*
-	 * Reject KVM_RUN if userspace attempts to run the vCPU with an invalid
-	 * VMSA, e.g. if userspace forces the vCPU to be RUNNABLE after an SNP
-	 * AP Destroy event.
-	 */
-	if (sev_es_guest(kvm) && !VALID_PAGE(svm->vmcb->control.vmsa_pa))
-		return -EINVAL;
-
-	/* Assign the asid allocated with this SEV guest */
-	svm->asid = asid;
-
-	/*
-	 * Flush guest TLB if the VMCB was executed on a differet host CPU in
-	 * previous VMRUNs.
-	 */
-	if (svm->vcpu.arch.last_vmentry_cpu == cpu)
-		return 0;
-
-	vmcb_set_flush_asid(svm->vmcb);
-	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
-	return 0;
-}
-
 #define GHCB_SCRATCH_AREA_LIMIT		(16ULL * PAGE_SIZE)
 static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
 {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e6e380411fbec..ce67112732e8c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3649,8 +3649,20 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
 		svm->current_vmcb->cpu = vcpu->cpu;
         }
 
-	if (sev_guest(vcpu->kvm))
-		return pre_sev_run(svm, vcpu->cpu);
+	if (sev_guest(vcpu->kvm)) {
+		/* Assign the asid allocated with this SEV guest */
+		svm->asid = sev_get_asid(vcpu->kvm);
+
+		/*
+		 * Reject KVM_RUN if userspace attempts to run the vCPU with an invalid
+		 * VMSA, e.g. if userspace forces the vCPU to be RUNNABLE after an SNP
+		 * AP Destroy event.
+		 */
+		if (sev_es_guest(vcpu->kvm) &&
+		    !VALID_PAGE(svm->vmcb->control.vmsa_pa))
+			return -EINVAL;
+		return 0;
+	}
 
 	/* FIXME: handle wraparound of asid_generation */
 	if (svm->current_vmcb->asid_generation != sd->asid_generation)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index ca38a233fa24c..3ab2a424992c1 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -760,7 +760,6 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
 
 /* sev.c */
 
-int pre_sev_run(struct vcpu_svm *svm, int cpu);
 void sev_init_vmcb(struct vcpu_svm *svm);
 void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm);
 int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 09/24] KVM: SEV: Generalize tracking ASID->vCPU with xarrays
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (7 preceding siblings ...)
  2025-03-26 19:36 ` [RFC PATCH 08/24] KVM: SEV: Drop pre_sev_run() Yosry Ahmed
@ 2025-03-26 19:36 ` Yosry Ahmed
  2025-04-03 20:05   ` Maxim Levitsky
  2025-03-26 19:36 ` [RFC PATCH 10/24] KVM: SVM: Use a single ASID per VM Yosry Ahmed
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

Following changes will track ASID to vCPU mappings for all ASIDs, not
just SEV ASIDs. Using per-CPU arrays with the maximum possible number of
ASIDs would be too expensive. Use xarrays to generalize tracking the
mappings instead. The logic is also mostly moved outside the SEV code to
allow future changes to reuse it for normal SVM VMs.

Storing into an xarray is more expensive than reading/writing to an
array, but is only done on vCPU load and should be mostly uncontended.
Also, the size of the xarray should be O(# of VMs), so it is not
expected to be huge. In fact, the xarray will probably use less memory
than the normal array even for SEV on machines that only run a few VMs.

When a new ASID is allocated, reserve an entry for it on all xarrays on
all CPUs. This allows the memory allocations to happen in a more relaxed
context (allowing reclaim and accounting), and failures to be handled at
VM creation time. However, entries will be allocated even on CPUs that
never run the VM.

The alternative is relying on on-demand GFP_ATOMIC allocations with
xa_store() on vCPU load.  These allocations are more likely to fail and
more difficult to handle since vCPU load cannot fail. Flushing the TLB
if the xa_store() fails is probably sufficient handling, but
preallocations are easier to reason about.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/sev.c | 25 ++++-----------------
 arch/x86/kvm/svm/svm.c | 50 +++++++++++++++++++++++++++++++-----------
 arch/x86/kvm/svm/svm.h |  7 +++---
 3 files changed, 44 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1742f51d4c194..c11da3259c089 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -211,6 +211,9 @@ static int sev_asid_new(struct kvm_sev_info *sev)
 		goto e_uncharge;
 	}
 
+	if (!svm_register_asid(asid))
+		goto e_uncharge;
+
 	__set_bit(asid, sev_asid_bitmap);
 
 	mutex_unlock(&sev_bitmap_lock);
@@ -231,18 +234,10 @@ unsigned int sev_get_asid(struct kvm *kvm)
 
 static void sev_asid_free(struct kvm_sev_info *sev)
 {
-	struct svm_cpu_data *sd;
-	int cpu;
+	svm_unregister_asid(sev->asid);
 
 	mutex_lock(&sev_bitmap_lock);
-
 	__set_bit(sev->asid, sev_reclaim_asid_bitmap);
-
-	for_each_possible_cpu(cpu) {
-		sd = per_cpu_ptr(&svm_data, cpu);
-		sd->sev_vcpus[sev->asid] = NULL;
-	}
-
 	mutex_unlock(&sev_bitmap_lock);
 
 	sev_misc_cg_uncharge(sev);
@@ -3076,18 +3071,6 @@ void sev_hardware_unsetup(void)
 	misc_cg_set_capacity(MISC_CG_RES_SEV_ES, 0);
 }
 
-int sev_cpu_init(struct svm_cpu_data *sd)
-{
-	if (!sev_enabled)
-		return 0;
-
-	sd->sev_vcpus = kcalloc(nr_asids, sizeof(void *), GFP_KERNEL);
-	if (!sd->sev_vcpus)
-		return -ENOMEM;
-
-	return 0;
-}
-
 /*
  * Pages used by hardware to hold guest encrypted state must be flushed before
  * returning them to the system.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ce67112732e8c..b740114a9d9bc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -694,7 +694,7 @@ static void svm_cpu_uninit(int cpu)
 	if (!sd->save_area)
 		return;
 
-	kfree(sd->sev_vcpus);
+	xa_destroy(&sd->asid_vcpu);
 	__free_page(__sme_pa_to_page(sd->save_area_pa));
 	sd->save_area_pa = 0;
 	sd->save_area = NULL;
@@ -711,18 +711,11 @@ static int svm_cpu_init(int cpu)
 	if (!save_area_page)
 		return ret;
 
-	ret = sev_cpu_init(sd);
-	if (ret)
-		goto free_save_area;
+	xa_init(&sd->asid_vcpu);
 
 	sd->save_area = page_address(save_area_page);
 	sd->save_area_pa = __sme_page_pa(save_area_page);
 	return 0;
-
-free_save_area:
-	__free_page(save_area_page);
-	return ret;
-
 }
 
 static void set_dr_intercepts(struct vcpu_svm *svm)
@@ -1557,6 +1550,7 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	unsigned int asid;
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu);
+	struct kvm_vcpu *prev;
 
 	if (vcpu->scheduled_out && !kvm_pause_in_guest(vcpu->kvm))
 		shrink_ple_window(vcpu);
@@ -1573,13 +1567,13 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (sev_guest(vcpu->kvm)) {
 		/*
 		 * Flush the TLB when a different vCPU using the same ASID is
-		 * run on the same CPU.
+		 * run on the same CPU. xa_store() should always succeed because
+		 * the entry is reserved when the ASID is allocated.
 		 */
 		asid = sev_get_asid(vcpu->kvm);
-		if (sd->sev_vcpus[asid] != vcpu) {
-			sd->sev_vcpus[asid] = vcpu;
+		prev = xa_store(&sd->asid_vcpu, asid, vcpu, GFP_ATOMIC);
+		if (prev != vcpu || WARN_ON_ONCE(xa_err(prev)))
 			kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
-		}
 	}
 }
 
@@ -5047,6 +5041,36 @@ static void svm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 	sev_vcpu_deliver_sipi_vector(vcpu, vector);
 }
 
+void svm_unregister_asid(unsigned int asid)
+{
+	struct svm_cpu_data *sd;
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		sd = per_cpu_ptr(&svm_data, cpu);
+		xa_erase(&sd->asid_vcpu, asid);
+	}
+}
+
+bool svm_register_asid(unsigned int asid)
+{
+	struct svm_cpu_data *sd;
+	int cpu;
+
+	/*
+	 * Preallocate entries on all CPUs for the ASID to avoid memory
+	 * allocations in the vCPU load path.
+	 */
+	for_each_possible_cpu(cpu) {
+		sd = per_cpu_ptr(&svm_data, cpu);
+		if (xa_reserve(&sd->asid_vcpu, asid, GFP_KERNEL_ACCOUNT)) {
+			svm_unregister_asid(asid);
+			return false;
+		}
+	}
+	return true;
+}
+
 static void svm_vm_destroy(struct kvm *kvm)
 {
 	avic_vm_destroy(kvm);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 3ab2a424992c1..4929b96d3d700 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -340,8 +340,7 @@ struct svm_cpu_data {
 
 	struct vmcb *current_vmcb;
 
-	/* index = sev_asid, value = vcpu pointer */
-	struct kvm_vcpu **sev_vcpus;
+	struct xarray asid_vcpu;
 };
 
 DECLARE_PER_CPU(struct svm_cpu_data, svm_data);
@@ -655,6 +654,8 @@ void set_msr_interception(struct kvm_vcpu *vcpu, u32 *msrpm, u32 msr,
 void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool disable);
 void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
 				     int trig_mode, int vec);
+bool svm_register_asid(unsigned int asid);
+void svm_unregister_asid(unsigned int asid);
 
 /* nested.c */
 
@@ -793,7 +794,6 @@ void sev_vm_destroy(struct kvm *kvm);
 void __init sev_set_cpu_caps(void);
 void __init sev_hardware_setup(void);
 void sev_hardware_unsetup(void);
-int sev_cpu_init(struct svm_cpu_data *sd);
 int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
 extern unsigned int max_sev_asid;
 void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
@@ -817,7 +817,6 @@ static inline void sev_vm_destroy(struct kvm *kvm) {}
 static inline void __init sev_set_cpu_caps(void) {}
 static inline void __init sev_hardware_setup(void) {}
 static inline void sev_hardware_unsetup(void) {}
-static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; }
 static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; }
 #define max_sev_asid 0
 static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 10/24] KVM: SVM: Use a single ASID per VM
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (8 preceding siblings ...)
  2025-03-26 19:36 ` [RFC PATCH 09/24] KVM: SEV: Generalize tracking ASID->vCPU with xarrays Yosry Ahmed
@ 2025-03-26 19:36 ` Yosry Ahmed
  2025-04-03 20:05   ` Maxim Levitsky
  2025-03-26 19:36 ` [RFC PATCH 11/24] KVM: nSVM: Use a separate ASID for nested guests Yosry Ahmed
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

The ASID generation and dynamic ASID allocation logic is now only used
by initialization the generation to 0 to trigger a new ASID allocation
per-vCPU on the first VMRUN, so the ASID is more-or-less static
per-vCPU.

Moreover, having a unique ASID per-vCPU is not required. ASIDs are local
to each physical CPU, and the ASID is flushed when a vCPU is migrated to
a new physical CPU anyway. SEV VMs have been using a single ASID per VM
already for other reasons.

Use a static ASID per VM and drop the dynamic ASID allocation logic. The
ASID is allocated during vCPU reset (SEV allocates its own ASID), and
the ASID is always flushed on first use in case it was used by another
VM previously.

The existing check for whether the ASID in the VMCB matches the per-vCPU
ASID is turned into a WARN because it is not expected behavior anymore,
and is moved from svm_vcpu_run() to pre_svm_run() such that it's not
checked for SEV guests. The check does not apply as-is for SEV, and a
separate check is added in pre_sev_run() instead. These checks will be
consolidated (among other code) in a followup change.

As ASIDs cannot be disabled (like how VPIDs can be disabled on Intel),
handle ASID allocation failure by falling back to a single shared ASID
allocated during hardware setup. This ASID is flushed on every VMRUN if
it is in use to avoid sharing TLB entries between different guests. This
should be unlikely with modern AMD CPUs as they have over 32k ASIDs.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/nested.c |   3 +-
 arch/x86/kvm/svm/svm.c    | 129 ++++++++++++++++++++++----------------
 arch/x86/kvm/svm/svm.h    |  10 +--
 3 files changed, 80 insertions(+), 62 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 11b02a0340d9e..81184b2fb27fd 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -677,8 +677,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 	vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
 	vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
 	vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa;
-
-	/* Done at vmrun: asid.  */
+	vmcb02->control.asid = svm_asid(vcpu->kvm);
 
 	/* Also overwritten later if necessary.  */
 	vmcb_clr_flush_asid(vmcb02);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b740114a9d9bc..f028d006f69dc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -249,6 +249,9 @@ static unsigned long iopm_base;
 
 DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
 
+static struct kvm_tlb_tags svm_asids;
+static unsigned int fallback_asid;
+
 /*
  * Only MSR_TSC_AUX is switched via the user return hook.  EFER is switched via
  * the VMCB, and the SYSCALL/SYSENTER MSRs are handled by VMLOAD/VMSAVE.
@@ -621,10 +624,6 @@ static int svm_enable_virtualization_cpu(void)
 		return -EBUSY;
 
 	sd = per_cpu_ptr(&svm_data, me);
-	sd->asid_generation = 1;
-	sd->max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1;
-	sd->next_asid = sd->max_asid + 1;
-	sd->min_asid = max_sev_asid + 1;
 
 	wrmsrl(MSR_EFER, efer | EFER_SVME);
 
@@ -1119,6 +1118,7 @@ static void svm_hardware_unsetup(void)
 
 	__free_pages(__sme_pa_to_page(iopm_base), get_order(IOPM_SIZE));
 	iopm_base = 0;
+	kvm_tlb_tags_destroy(&svm_asids);
 }
 
 static void init_seg(struct vmcb_seg *seg)
@@ -1225,6 +1225,20 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
 	}
 }
 
+unsigned int svm_asid(struct kvm *kvm)
+{
+	return to_kvm_svm(kvm)->asid;
+}
+
+static unsigned int svm_get_current_asid(struct vcpu_svm *svm)
+{
+	struct kvm *kvm = svm->vcpu.kvm;
+
+	if (sev_guest(kvm))
+		return sev_get_asid(kvm);
+	return svm_asid(kvm);
+}
+
 static void init_vmcb(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -1300,6 +1314,8 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	control->iopm_base_pa = iopm_base;
 	control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
 	control->int_ctl = V_INTR_MASKING_MASK;
+	control->asid = svm_asid(vcpu->kvm);
+	vmcb_set_flush_asid(svm->vmcb);
 
 	init_seg(&save->es);
 	init_seg(&save->ss);
@@ -1332,8 +1348,6 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 		save->g_pat = vcpu->arch.pat;
 		save->cr3 = 0;
 	}
-	svm->current_vmcb->asid_generation = 0;
-	svm->asid = 0;
 
 	svm->nested.vmcb12_gpa = INVALID_GPA;
 	svm->nested.last_vmcb12_gpa = INVALID_GPA;
@@ -1547,9 +1561,9 @@ static void svm_prepare_host_switch(struct kvm_vcpu *vcpu)
 
 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
-	unsigned int asid;
 	struct vcpu_svm *svm = to_svm(vcpu);
 	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu);
+	unsigned int asid = svm_get_current_asid(svm);
 	struct kvm_vcpu *prev;
 
 	if (vcpu->scheduled_out && !kvm_pause_in_guest(vcpu->kvm))
@@ -1564,17 +1578,14 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_vcpu_load(vcpu, cpu);
 
-	if (sev_guest(vcpu->kvm)) {
-		/*
-		 * Flush the TLB when a different vCPU using the same ASID is
-		 * run on the same CPU. xa_store() should always succeed because
-		 * the entry is reserved when the ASID is allocated.
-		 */
-		asid = sev_get_asid(vcpu->kvm);
-		prev = xa_store(&sd->asid_vcpu, asid, vcpu, GFP_ATOMIC);
-		if (prev != vcpu || WARN_ON_ONCE(xa_err(prev)))
-			kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
-	}
+	/*
+	 * Flush the TLB when a different vCPU using the same ASID is
+	 * run on the same CPU. xa_store() should always succeed because
+	 * the entry is reserved when the ASID is allocated.
+	 */
+	prev = xa_store(&sd->asid_vcpu, asid, vcpu, GFP_ATOMIC);
+	if (prev != vcpu || WARN_ON_ONCE(xa_err(prev)))
+		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
 }
 
 static void svm_vcpu_put(struct kvm_vcpu *vcpu)
@@ -1989,19 +2000,6 @@ static void svm_update_exception_bitmap(struct kvm_vcpu *vcpu)
 	}
 }
 
-static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd)
-{
-	if (sd->next_asid > sd->max_asid) {
-		++sd->asid_generation;
-		sd->next_asid = sd->min_asid;
-		svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ALL_ASID;
-		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
-	}
-
-	svm->current_vmcb->asid_generation = sd->asid_generation;
-	svm->asid = sd->next_asid++;
-}
-
 static void svm_set_dr6(struct kvm_vcpu *vcpu, unsigned long value)
 {
 	struct vmcb *vmcb = to_svm(vcpu)->vmcb;
@@ -3629,8 +3627,16 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 
 static int pre_svm_run(struct kvm_vcpu *vcpu)
 {
-	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, vcpu->cpu);
 	struct vcpu_svm *svm = to_svm(vcpu);
+	unsigned int asid = svm_get_current_asid(svm);
+
+	/*
+	 * Reject KVM_RUN if userspace attempts to run the vCPU with an invalid
+	 * VMSA, e.g. if userspace forces the vCPU to be RUNNABLE after an SNP
+	 * AP Destroy event.
+	 */
+	if (sev_es_guest(vcpu->kvm) && !VALID_PAGE(svm->vmcb->control.vmsa_pa))
+		return -EINVAL;
 
 	/*
 	 * If the previous VMRUN of the VMCB occurred on a different physical
@@ -3643,25 +3649,20 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
 		svm->current_vmcb->cpu = vcpu->cpu;
         }
 
-	if (sev_guest(vcpu->kvm)) {
-		/* Assign the asid allocated with this SEV guest */
-		svm->asid = sev_get_asid(vcpu->kvm);
+	/*
+	 * If we run out of space and ASID allocation fails, we fallback to a
+	 * shared fallback ASID. For that ASID, we need to flush the TLB on
+	 * every VMRUN to avoid sharing TLB entries between different guests.
+	 */
+	if (unlikely(asid == fallback_asid))
+		vmcb_set_flush_asid(svm->vmcb);
 
-		/*
-		 * Reject KVM_RUN if userspace attempts to run the vCPU with an invalid
-		 * VMSA, e.g. if userspace forces the vCPU to be RUNNABLE after an SNP
-		 * AP Destroy event.
-		 */
-		if (sev_es_guest(vcpu->kvm) &&
-		    !VALID_PAGE(svm->vmcb->control.vmsa_pa))
-			return -EINVAL;
-		return 0;
+	if (WARN_ON_ONCE(svm->vmcb->control.asid != asid)) {
+		vmcb_set_flush_asid(svm->vmcb);
+		svm->vmcb->control.asid = asid;
+		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
 	}
 
-	/* FIXME: handle wraparound of asid_generation */
-	if (svm->current_vmcb->asid_generation != sd->asid_generation)
-		new_asid(svm, sd);
-
 	return 0;
 }
 
@@ -4062,7 +4063,7 @@ static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
 
-	invlpga(gva, svm->vmcb->control.asid);
+	invlpga(gva, svm_get_current_asid(svm));
 }
 
 static inline void sync_cr8_to_lapic(struct kvm_vcpu *vcpu)
@@ -4308,10 +4309,6 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
 
 	sync_lapic_to_cr8(vcpu);
 
-	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
-		svm->vmcb->control.asid = svm->asid;
-		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
-	}
 	svm->vmcb->save.cr2 = vcpu->arch.cr2;
 
 	svm_hv_update_vp_id(svm->vmcb, vcpu);
@@ -5073,13 +5070,18 @@ bool svm_register_asid(unsigned int asid)
 
 static void svm_vm_destroy(struct kvm *kvm)
 {
+	struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
+
 	avic_vm_destroy(kvm);
 	sev_vm_destroy(kvm);
+	kvm_tlb_tags_free(&svm_asids, kvm_svm->asid);
 }
 
 static int svm_vm_init(struct kvm *kvm)
 {
+	struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
 	int type = kvm->arch.vm_type;
+	unsigned int asid;
 
 	if (type != KVM_X86_DEFAULT_VM &&
 	    type != KVM_X86_SW_PROTECTED_VM) {
@@ -5100,6 +5102,13 @@ static int svm_vm_init(struct kvm *kvm)
 			return ret;
 	}
 
+	asid = kvm_tlb_tags_alloc(&svm_asids);
+	if (asid && !svm_register_asid(asid)) {
+		kvm_tlb_tags_free(&svm_asids, asid);
+		asid = 0;
+	}
+	kvm_svm->asid = asid ?: fallback_asid;
+
 	return 0;
 }
 
@@ -5381,6 +5390,7 @@ static __init int svm_hardware_setup(void)
 	void *iopm_va;
 	int r;
 	unsigned int order = get_order(IOPM_SIZE);
+	unsigned int min_asid, max_asid;
 
 	/*
 	 * NX is required for shadow paging and for NPT if the NX huge pages
@@ -5473,6 +5483,13 @@ static __init int svm_hardware_setup(void)
 	 */
 	sev_hardware_setup();
 
+	/* Consumes max_sev_asid initialized by sev_hardware_setup() */
+	min_asid = max_sev_asid + 1;
+	max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1;
+	r = kvm_tlb_tags_init(&svm_asids, min_asid, max_asid);
+	if (r)
+		goto err;
+
 	svm_hv_hardware_setup();
 
 	for_each_possible_cpu(cpu) {
@@ -5481,6 +5498,12 @@ static __init int svm_hardware_setup(void)
 			goto err;
 	}
 
+	fallback_asid = kvm_tlb_tags_alloc(&svm_asids);
+	WARN_ON_ONCE(!fallback_asid);
+
+	/* Needs to be after svm_cpu_init() initializes the per-CPU xarrays */
+	svm_register_asid(fallback_asid);
+
 	enable_apicv = avic = avic && avic_hardware_setup();
 
 	if (!enable_apicv) {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4929b96d3d700..436b7e83141b9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -117,6 +117,8 @@ struct kvm_sev_info {
 struct kvm_svm {
 	struct kvm kvm;
 
+	unsigned int asid;
+
 	/* Struct members for AVIC */
 	u32 avic_vm_id;
 	struct page *avic_logical_id_table_page;
@@ -132,7 +134,6 @@ struct kvm_vmcb_info {
 	struct vmcb *ptr;
 	unsigned long pa;
 	int cpu;
-	uint64_t asid_generation;
 };
 
 struct vmcb_save_area_cached {
@@ -247,7 +248,6 @@ struct vcpu_svm {
 	struct vmcb *vmcb;
 	struct kvm_vmcb_info vmcb01;
 	struct kvm_vmcb_info *current_vmcb;
-	u32 asid;
 	u32 sysenter_esp_hi;
 	u32 sysenter_eip_hi;
 	uint64_t tsc_aux;
@@ -330,11 +330,6 @@ struct vcpu_svm {
 };
 
 struct svm_cpu_data {
-	u64 asid_generation;
-	u32 max_asid;
-	u32 next_asid;
-	u32 min_asid;
-
 	struct vmcb *save_area;
 	unsigned long save_area_pa;
 
@@ -656,6 +651,7 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
 				     int trig_mode, int vec);
 bool svm_register_asid(unsigned int asid);
 void svm_unregister_asid(unsigned int asid);
+unsigned int svm_asid(struct kvm *kvm);
 
 /* nested.c */
 
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 11/24] KVM: nSVM: Use a separate ASID for nested guests
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (9 preceding siblings ...)
  2025-03-26 19:36 ` [RFC PATCH 10/24] KVM: SVM: Use a single ASID per VM Yosry Ahmed
@ 2025-03-26 19:36 ` Yosry Ahmed
  2025-04-03 20:09   ` Maxim Levitsky
  2025-03-26 19:36 ` [RFC PATCH 12/24] KVM: x86: hyper-v: Pass is_guest_mode to kvm_hv_vcpu_purge_flush_tlb() Yosry Ahmed
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

The per-VM ASID is currently shared by both L1 and L2 guests. That ASID
is currently flushed on every transition between L1 and L2.

Allocate and track a separate ASID per-VM for nested guests. This is in
preparation for doing fine-grained TLB flushes on nested transitions
instead of unconditional full flushes.

Nested ASIDs are still not fully maintained (e.g. a remote flush will
only flush the current ASID), so keep the TLB flush on every transition
until this is sorted out in following changes.

Add a helper to get the ASID associated with a specific VMCB and use it
instead of directly reading the VM's ASID. This transparently uses L2's
ASID when an L2 guest is being run.

L1's ASID is flushed on KVM_REQ_TLB_FLUSH_GUEST if it is the active
context, so remove the TODO in nested_svm_transition_tlb_flush() about
it.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/nested.c |  8 ++++++--
 arch/x86/kvm/svm/svm.c    | 13 +++++++++++--
 arch/x86/kvm/svm/svm.h    |  3 ++-
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 81184b2fb27fd..75223869aa8c6 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -495,7 +495,6 @@ static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
 	 *  - Honor L1's request to flush an ASID on nested VMRUN
 	 *  - Sync nested NPT MMU on VMRUN that flushes L2's ASID[*]
 	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
-	 *  - Flush L1's ASID on KVM_REQ_TLB_FLUSH_GUEST
 	 *
 	 * [*] Unlike nested EPT, SVM's ASID management can invalidate nested
 	 *     NPT guest-physical mappings on VMRUN.
@@ -677,7 +676,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 	vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
 	vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
 	vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa;
-	vmcb02->control.asid = svm_asid(vcpu->kvm);
+	vmcb02->control.asid = svm_nested_asid(vcpu->kvm);
 
 	/* Also overwritten later if necessary.  */
 	vmcb_clr_flush_asid(vmcb02);
@@ -1179,6 +1178,7 @@ static void nested_svm_triple_fault(struct kvm_vcpu *vcpu)
 
 int svm_allocate_nested(struct vcpu_svm *svm)
 {
+	struct kvm_svm *kvm_svm = to_kvm_svm(svm->vcpu.kvm);
 	struct page *vmcb02_page;
 
 	if (svm->nested.initialized)
@@ -1196,6 +1196,10 @@ int svm_allocate_nested(struct vcpu_svm *svm)
 	svm_vcpu_init_msrpm(&svm->vcpu, svm->nested.msrpm);
 
 	svm->nested.initialized = true;
+
+	if (!kvm_svm->nested_asid)
+		kvm_svm->nested_asid = kvm_svm->asid;
+
 	return 0;
 
 err_free_vmcb02:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f028d006f69dc..e664d8428c792 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1225,17 +1225,26 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
 	}
 }
 
-unsigned int svm_asid(struct kvm *kvm)
+unsigned int svm_nested_asid(struct kvm *kvm)
+{
+	return to_kvm_svm(kvm)->nested_asid;
+}
+
+static unsigned int svm_asid(struct kvm *kvm)
 {
 	return to_kvm_svm(kvm)->asid;
 }
 
 static unsigned int svm_get_current_asid(struct vcpu_svm *svm)
 {
-	struct kvm *kvm = svm->vcpu.kvm;
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm *kvm = vcpu->kvm;
 
 	if (sev_guest(kvm))
 		return sev_get_asid(kvm);
+	if (is_guest_mode(vcpu))
+		return svm_nested_asid(kvm);
+	WARN_ON_ONCE(svm->current_vmcb != &svm->vmcb01);
 	return svm_asid(kvm);
 }
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 436b7e83141b9..e67e3a64e92f7 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -118,6 +118,7 @@ struct kvm_svm {
 	struct kvm kvm;
 
 	unsigned int asid;
+	unsigned int nested_asid;
 
 	/* Struct members for AVIC */
 	u32 avic_vm_id;
@@ -651,7 +652,7 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
 				     int trig_mode, int vec);
 bool svm_register_asid(unsigned int asid);
 void svm_unregister_asid(unsigned int asid);
-unsigned int svm_asid(struct kvm *kvm);
+unsigned int svm_nested_asid(struct kvm *kvm);
 
 /* nested.c */
 
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 12/24] KVM: x86: hyper-v: Pass is_guest_mode to kvm_hv_vcpu_purge_flush_tlb()
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (10 preceding siblings ...)
  2025-03-26 19:36 ` [RFC PATCH 11/24] KVM: nSVM: Use a separate ASID for nested guests Yosry Ahmed
@ 2025-03-26 19:36 ` Yosry Ahmed
  2025-04-03 20:09   ` Maxim Levitsky
  2025-03-26 19:36 ` [RFC PATCH 13/24] KVM: nSVM: Parameterize svm_flush_tlb_asid() by is_guest_mode Yosry Ahmed
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

Instead of calling is_guest_mode() inside kvm_hv_vcpu_purge_flush_tlb()
pass the value from the caller. Future changes will pass different
values than is_guest_mode(vcpu).

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/hyperv.h  | 8 +++++---
 arch/x86/kvm/svm/svm.c | 2 +-
 arch/x86/kvm/x86.c     | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 913bfc96959cb..be715deaeb003 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -203,14 +203,15 @@ static inline struct kvm_vcpu_hv_tlb_flush_fifo *kvm_hv_get_tlb_flush_fifo(struc
 	return &hv_vcpu->tlb_flush_fifo[i];
 }
 
-static inline void kvm_hv_vcpu_purge_flush_tlb(struct kvm_vcpu *vcpu)
+static inline void kvm_hv_vcpu_purge_flush_tlb(struct kvm_vcpu *vcpu,
+					       bool is_guest_mode)
 {
 	struct kvm_vcpu_hv_tlb_flush_fifo *tlb_flush_fifo;
 
 	if (!to_hv_vcpu(vcpu) || !kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu))
 		return;
 
-	tlb_flush_fifo = kvm_hv_get_tlb_flush_fifo(vcpu, is_guest_mode(vcpu));
+	tlb_flush_fifo = kvm_hv_get_tlb_flush_fifo(vcpu, is_guest_mode);
 
 	kfifo_reset_out(&tlb_flush_fifo->entries);
 }
@@ -285,7 +286,8 @@ static inline int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
 {
 	return HV_STATUS_ACCESS_DENIED;
 }
-static inline void kvm_hv_vcpu_purge_flush_tlb(struct kvm_vcpu *vcpu) {}
+static inline void kvm_hv_vcpu_purge_flush_tlb(struct kvm_vcpu *vcpu,
+					       bool is_guest_mode) {}
 static inline bool kvm_hv_synic_has_vector(struct kvm_vcpu *vcpu, int vector)
 {
 	return false;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e664d8428c792..865c5ce4fa473 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4025,7 +4025,7 @@ static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
 	 * A TLB flush for the current ASID flushes both "host" and "guest" TLB
 	 * entries, and thus is a superset of Hyper-V's fine grained flushing.
 	 */
-	kvm_hv_vcpu_purge_flush_tlb(vcpu);
+	kvm_hv_vcpu_purge_flush_tlb(vcpu, is_guest_mode(vcpu));
 
 	/*
 	 * Flush only the current ASID even if the TLB flush was invoked via
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 182f18ebc62f3..469a8e5526902 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3615,7 +3615,7 @@ static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
 	 * Flushing all "guest" TLB is always a superset of Hyper-V's fine
 	 * grained flushing.
 	 */
-	kvm_hv_vcpu_purge_flush_tlb(vcpu);
+	kvm_hv_vcpu_purge_flush_tlb(vcpu, is_guest_mode(vcpu));
 }
 
 
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 13/24] KVM: nSVM: Parameterize svm_flush_tlb_asid() by is_guest_mode
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (11 preceding siblings ...)
  2025-03-26 19:36 ` [RFC PATCH 12/24] KVM: x86: hyper-v: Pass is_guest_mode to kvm_hv_vcpu_purge_flush_tlb() Yosry Ahmed
@ 2025-03-26 19:36 ` Yosry Ahmed
  2025-04-03 20:10   ` Maxim Levitsky
  2025-03-26 19:36 ` [RFC PATCH 14/24] KVM: nSVM: Split nested_svm_transition_tlb_flush() into entry/exit fns Yosry Ahmed
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

svm_flush_tlb_asid() currently operates on the current VMCB. In
preparation for properly tracking TLB flushes for L1 and L2 ASIDs,
refactor it to take is_guest_mode and find the proper VMCB. All existing
callers pass is_guest_mode(vcpu) to maintain existing behavior for now.

Move the comment about only flushing the current ASID to
svm_flush_tlb_all(), where it probably should have been anyway, because
svm_flush_tlb_asid() now flushes a given ASID, not the current ASID.

Create a svm_flush_tlb_guest() wrapper to use as the flush_tlb_guest()
callback.

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/svm.c | 39 +++++++++++++++++++++++++--------------
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 865c5ce4fa473..fb6b9f88a1504 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4016,25 +4016,24 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
 	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
 }
 
-static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
+static struct vmcb *svm_get_vmcb(struct vcpu_svm *svm, bool is_guest_mode)
+{
+	return is_guest_mode ? svm->nested.vmcb02.ptr : svm->vmcb01.ptr;
+}
+
+static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu, bool is_guest_mode)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb *vmcb = svm_get_vmcb(svm, is_guest_mode);
 
 	/*
 	 * Unlike VMX, SVM doesn't provide a way to flush only NPT TLB entries.
 	 * A TLB flush for the current ASID flushes both "host" and "guest" TLB
 	 * entries, and thus is a superset of Hyper-V's fine grained flushing.
 	 */
-	kvm_hv_vcpu_purge_flush_tlb(vcpu, is_guest_mode(vcpu));
-
-	/*
-	 * Flush only the current ASID even if the TLB flush was invoked via
-	 * kvm_flush_remote_tlbs().  Although flushing remote TLBs requires all
-	 * ASIDs to be flushed, KVM uses a single ASID for L1 and L2, and
-	 * unconditionally does a TLB flush on both nested VM-Enter and nested
-	 * VM-Exit (via kvm_mmu_reset_context()).
-	 */
-	vmcb_set_flush_asid(svm->vmcb);
+	kvm_hv_vcpu_purge_flush_tlb(vcpu, is_guest_mode);
+	if (vmcb)
+		vmcb_set_flush_asid(vmcb);
 }
 
 static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
@@ -4050,7 +4049,7 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
 	if (svm_hv_is_enlightened_tlb_enabled(vcpu) && VALID_PAGE(root_tdp))
 		hyperv_flush_guest_mapping(root_tdp);
 
-	svm_flush_tlb_asid(vcpu);
+	svm_flush_tlb_asid(vcpu, is_guest_mode(vcpu));
 }
 
 static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
@@ -4065,7 +4064,14 @@ static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
 	if (WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
 		hv_flush_remote_tlbs(vcpu->kvm);
 
-	svm_flush_tlb_asid(vcpu);
+	/*
+	 * Flush only the current ASID even if the TLB flush was invoked via
+	 * kvm_flush_remote_tlbs().  Although flushing remote TLBs requires all
+	 * ASIDs to be flushed, KVM uses a single ASID for L1 and L2, and
+	 * unconditionally does a TLB flush on both nested VM-Enter and nested
+	 * VM-Exit (via kvm_mmu_reset_context()).
+	 */
+	svm_flush_tlb_asid(vcpu, is_guest_mode(vcpu));
 }
 
 static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
@@ -4075,6 +4081,11 @@ static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
 	invlpga(gva, svm_get_current_asid(svm));
 }
 
+static void svm_flush_tlb_guest(struct kvm_vcpu *vcpu)
+{
+	svm_flush_tlb_asid(vcpu, is_guest_mode(vcpu));
+}
+
 static inline void sync_cr8_to_lapic(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -5187,7 +5198,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.flush_tlb_all = svm_flush_tlb_all,
 	.flush_tlb_current = svm_flush_tlb_current,
 	.flush_tlb_gva = svm_flush_tlb_gva,
-	.flush_tlb_guest = svm_flush_tlb_asid,
+	.flush_tlb_guest = svm_flush_tlb_guest,
 
 	.vcpu_pre_run = svm_vcpu_pre_run,
 	.vcpu_run = svm_vcpu_run,
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 14/24] KVM: nSVM: Split nested_svm_transition_tlb_flush() into entry/exit fns
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (12 preceding siblings ...)
  2025-03-26 19:36 ` [RFC PATCH 13/24] KVM: nSVM: Parameterize svm_flush_tlb_asid() by is_guest_mode Yosry Ahmed
@ 2025-03-26 19:36 ` Yosry Ahmed
  2025-03-26 19:36 ` [RFC PATCH 15/24] KVM: x86/mmu: rename __kvm_mmu_invalidate_addr() Yosry Ahmed
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

The handling for the entry and exit TLB flushes will diverge
significantly in the following changes. Instead of adding an 'is_vmenter'
argument like nested_vmx_transition_tlb_flush(), just split the function
into two variants for 'entry' and 'exit'.

No functional change intended.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/nested.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 75223869aa8c6..c336ab63c6da3 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -482,7 +482,7 @@ static void nested_save_pending_event_to_vmcb12(struct vcpu_svm *svm,
 	vmcb12->control.exit_int_info = exit_int_info;
 }
 
-static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
+static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 {
 	/* Handle pending Hyper-V TLB flush requests */
 	kvm_hv_nested_transtion_tlb_flush(vcpu, npt_enabled);
@@ -503,6 +503,15 @@ static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
 	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
 
+/* See nested_svm_entry_tlb_flush() */
+static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
+{
+	kvm_hv_nested_transtion_tlb_flush(vcpu, npt_enabled);
+
+	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
+	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
+}
+
 /*
  * Load guest's/host's cr3 on nested vmentry or vmexit. @nested_npt is true
  * if we are emulating VM-Entry into a guest with NPT enabled.
@@ -645,7 +654,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 	u32 pause_count12;
 	u32 pause_thresh12;
 
-	nested_svm_transition_tlb_flush(vcpu);
+	nested_svm_entry_tlb_flush(vcpu);
 
 	/* Enter Guest-Mode */
 	enter_guest_mode(vcpu);
@@ -1130,7 +1139,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	kvm_vcpu_unmap(vcpu, &map);
 
-	nested_svm_transition_tlb_flush(vcpu);
+	nested_svm_exit_tlb_flush(vcpu);
 
 	nested_svm_uninit_mmu_context(vcpu);
 
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 15/24] KVM: x86/mmu: rename __kvm_mmu_invalidate_addr()
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (13 preceding siblings ...)
  2025-03-26 19:36 ` [RFC PATCH 14/24] KVM: nSVM: Split nested_svm_transition_tlb_flush() into entry/exit fns Yosry Ahmed
@ 2025-03-26 19:36 ` Yosry Ahmed
  2025-04-03 20:10   ` Maxim Levitsky
  2025-03-26 19:36 ` [RFC PATCH 16/24] KVM: x86/mmu: Allow skipping the gva flush in kvm_mmu_invalidate_addr() Yosry Ahmed
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

In preparation for creating another helper for
kvm_mmu_invalidate_addr(), rename __kvm_mmu_invalidate_addr() to
kvm_mmu_invalidate_addr_in_root().

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/mmu/mmu.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 63bb77ee1bb16..4a72ada0a7585 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6317,8 +6317,9 @@ void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_print_sptes);
 
-static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
-				      u64 addr, hpa_t root_hpa)
+static void kvm_mmu_invalidate_addr_in_root(struct kvm_vcpu *vcpu,
+					    struct kvm_mmu *mmu,
+					    u64 addr, hpa_t root_hpa)
 {
 	struct kvm_shadow_walk_iterator iterator;
 
@@ -6374,11 +6375,11 @@ void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 		return;
 
 	if (roots & KVM_MMU_ROOT_CURRENT)
-		__kvm_mmu_invalidate_addr(vcpu, mmu, addr, mmu->root.hpa);
+		kvm_mmu_invalidate_addr_in_root(vcpu, mmu, addr, mmu->root.hpa);
 
 	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
 		if (roots & KVM_MMU_ROOT_PREVIOUS(i))
-			__kvm_mmu_invalidate_addr(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
+			kvm_mmu_invalidate_addr_in_root(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
 	}
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_invalidate_addr);
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 16/24] KVM: x86/mmu: Allow skipping the gva flush in kvm_mmu_invalidate_addr()
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (14 preceding siblings ...)
  2025-03-26 19:36 ` [RFC PATCH 15/24] KVM: x86/mmu: rename __kvm_mmu_invalidate_addr() Yosry Ahmed
@ 2025-03-26 19:36 ` Yosry Ahmed
  2025-04-03 20:10   ` Maxim Levitsky
  2025-03-26 19:36 ` [RFC PATCH 17/24] KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH Yosry Ahmed
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

Refactor a helper out of kvm_mmu_invalidate_addr() that allows skipping
the gva flush. This will be used when an invalidation is needed but the
GVA TLB translations that require invalidation are not of the current
context (e.g. when emulating INVLPGA for L1 to flush L2's translations).

No functional change intended.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/mmu/mmu.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4a72ada0a7585..e2b1994f12753 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6355,15 +6355,15 @@ static void kvm_mmu_invalidate_addr_in_root(struct kvm_vcpu *vcpu,
 	write_unlock(&vcpu->kvm->mmu_lock);
 }
 
-void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
-			     u64 addr, unsigned long roots)
+static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+				      u64 addr, unsigned long roots, bool gva_flush)
 {
 	int i;
 
 	WARN_ON_ONCE(roots & ~KVM_MMU_ROOTS_ALL);
 
 	/* It's actually a GPA for vcpu->arch.guest_mmu.  */
-	if (mmu != &vcpu->arch.guest_mmu) {
+	if (gva_flush && mmu != &vcpu->arch.guest_mmu) {
 		/* INVLPG on a non-canonical address is a NOP according to the SDM.  */
 		if (is_noncanonical_invlpg_address(addr, vcpu))
 			return;
@@ -6382,6 +6382,12 @@ void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			kvm_mmu_invalidate_addr_in_root(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
 	}
 }
+
+void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+			     u64 addr, unsigned long roots)
+{
+	__kvm_mmu_invalidate_addr(vcpu, mmu, addr, roots, true);
+}
 EXPORT_SYMBOL_GPL(kvm_mmu_invalidate_addr);
 
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 17/24] KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (15 preceding siblings ...)
  2025-03-26 19:36 ` [RFC PATCH 16/24] KVM: x86/mmu: Allow skipping the gva flush in kvm_mmu_invalidate_addr() Yosry Ahmed
@ 2025-03-26 19:36 ` Yosry Ahmed
  2025-04-03 20:10   ` Maxim Levitsky
  2025-03-26 19:41 ` [RFC PATCH 18/24] KVM: nSVM: Handle nested TLB flush requests through TLB_CONTROL Yosry Ahmed
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

KVM_REQ_TLB_FLUSH is used to flush all TLB entries for all contexts
(e.g. in kvm_flush_remote_tlbs()). Flush both L1 and L2 ASIDs in
svm_flush_tlb_all() to handle it appropriately.

This is currently not required as nested transitions do unconditional
TLB flushes, but this is a step toward eliminating that.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/nested.c |  1 -
 arch/x86/kvm/svm/svm.c    | 10 ++--------
 2 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index c336ab63c6da3..56a4ff480bb3d 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -491,7 +491,6 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 	 * TODO: optimize unconditional TLB flush/MMU sync.  A partial list of
 	 * things to fix before this can be conditional:
 	 *
-	 *  - Flush TLBs for both L1 and L2 remote TLB flush
 	 *  - Honor L1's request to flush an ASID on nested VMRUN
 	 *  - Sync nested NPT MMU on VMRUN that flushes L2's ASID[*]
 	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index fb6b9f88a1504..4cad1085936bb 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4064,14 +4064,8 @@ static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
 	if (WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
 		hv_flush_remote_tlbs(vcpu->kvm);
 
-	/*
-	 * Flush only the current ASID even if the TLB flush was invoked via
-	 * kvm_flush_remote_tlbs().  Although flushing remote TLBs requires all
-	 * ASIDs to be flushed, KVM uses a single ASID for L1 and L2, and
-	 * unconditionally does a TLB flush on both nested VM-Enter and nested
-	 * VM-Exit (via kvm_mmu_reset_context()).
-	 */
-	svm_flush_tlb_asid(vcpu, is_guest_mode(vcpu));
+	svm_flush_tlb_asid(vcpu, false);
+	svm_flush_tlb_asid(vcpu, true);
 }
 
 static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 18/24] KVM: nSVM: Handle nested TLB flush requests through TLB_CONTROL
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (16 preceding siblings ...)
  2025-03-26 19:36 ` [RFC PATCH 17/24] KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH Yosry Ahmed
@ 2025-03-26 19:41 ` Yosry Ahmed
  2025-03-26 19:43 ` [RFC PATCH 19/24] KVM: nSVM: Flush the TLB if L1 changes L2's ASID Yosry Ahmed
  2025-03-26 19:44 ` [RFC PATCH 20/24] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry Yosry Ahmed
  19 siblings, 0 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

Handle L1's requests to flush L2's TLB through the TLB_CONTROL field of
VMCB12. This is currently redundant because a full flush is executed on
every nested transition, but is a step towards removing that.

TLB_CONTROL_FLUSH_ALL_ASID flushes all ASIDs from L1's perspective,
including its own, so do a guest TLB flush on both transitions. Never
propagate TLB_CONTROL_FLUSH_ALL_ASID from the guest to the actual VMCB,
as this gives the guest the power to flush the entire physical TLB
(including translations for the host and other VMs).

For ASID flushes, the TLB flush is only done when entering L2. The
nested NPT MMU is also sync'd because TLB_CONTROL also flushes NPT
guest-physical mappings.

All TLB_CONTROL values can be handled by KVM regardless of FLUSHBYASID
support on the underlying CPU, so keep advertising FLUSHBYASID to the
guest unconditionally.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/nested.c | 31 ++++++++++++++++++++++++++-----
 arch/x86/kvm/svm/svm.c    |  5 ++---
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 56a4ff480bb3d..ffe01c2ae7db5 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -484,19 +484,35 @@ static void nested_save_pending_event_to_vmcb12(struct vcpu_svm *svm,
 
 static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+
 	/* Handle pending Hyper-V TLB flush requests */
 	kvm_hv_nested_transtion_tlb_flush(vcpu, npt_enabled);
 
+	/*
+	 * If L1 requested a TLB flush for L2, flush L2's TLB on nested entry
+	 * and sync the nested NPT MMU, as TLB_CONTROL also flushes NPT
+	 * guest-physical mappings. We technically only need to flush guest_mode
+	 * page tables.
+	 *
+	 * If L1 requested a full TLB flush for all ASIDs, L1's own ASID is also
+	 * flushed in nested_svm_exit_tlb_flush() before running L1.
+	 *
+	 * Note that TLB_CONTROL_FLUSH_ASID_LOCAL is handled exactly like
+	 * TLB_CONTROL_FLUSH_ASID. We can technically flush less TLB entries,
+	 * but this would require significantly more complexity.
+	 */
+	if (svm->nested.ctl.tlb_ctl != TLB_CONTROL_DO_NOTHING) {
+		if (nested_npt_enabled(svm))
+			kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
+		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+	}
+
 	/*
 	 * TODO: optimize unconditional TLB flush/MMU sync.  A partial list of
 	 * things to fix before this can be conditional:
 	 *
-	 *  - Honor L1's request to flush an ASID on nested VMRUN
-	 *  - Sync nested NPT MMU on VMRUN that flushes L2's ASID[*]
 	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
-	 *
-	 * [*] Unlike nested EPT, SVM's ASID management can invalidate nested
-	 *     NPT guest-physical mappings on VMRUN.
 	 */
 	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
 	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
@@ -505,8 +521,13 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 /* See nested_svm_entry_tlb_flush() */
 static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+
 	kvm_hv_nested_transtion_tlb_flush(vcpu, npt_enabled);
 
+	if (svm->nested.ctl.tlb_ctl == TLB_CONTROL_FLUSH_ALL_ASID)
+		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+
 	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
 	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4cad1085936bb..3e33ac876eb32 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5332,9 +5332,8 @@ static __init void svm_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_VMCBCLEAN);
 
 		/*
-		 * KVM currently flushes TLBs on *every* nested SVM transition,
-		 * and so for all intents and purposes KVM supports flushing by
-		 * ASID, i.e. KVM is guaranteed to honor every L1 ASID flush.
+		 * KVM handles all TLB_CONTROL values set by L1, even if the
+		 * underlying CPU does not. See nested_svm_entry_tlb_flush().
 		 */
 		kvm_cpu_cap_set(X86_FEATURE_FLUSHBYASID);
 
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 19/24] KVM: nSVM: Flush the TLB if L1 changes L2's ASID
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (17 preceding siblings ...)
  2025-03-26 19:41 ` [RFC PATCH 18/24] KVM: nSVM: Handle nested TLB flush requests through TLB_CONTROL Yosry Ahmed
@ 2025-03-26 19:43 ` Yosry Ahmed
  2025-03-26 19:44 ` [RFC PATCH 20/24] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry Yosry Ahmed
  19 siblings, 0 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

KVM tracks a single ASID for L2 guests. An L1 vCPU could change the ASID
it has assigned L2 due to switching to a different L2 guest or simply to
avoid flushing L2's existing ASID. Flush L2's TLB when this happens to
avoid reusing TLB entries from the old ASID (from L1's perspective).

Remove the comment in __nested_copy_vmcb_control_to_cache() about the
cached ASID usage, as this changes makes it stale by adding another
usage.

This is heavily inspired by nVMX's handling of last_vpid.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/nested.c | 5 ++++-
 arch/x86/kvm/svm/svm.h    | 2 ++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index ffe01c2ae7db5..ca8db246ac050 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -368,7 +368,6 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu,
 	to->pause_filter_count  = from->pause_filter_count;
 	to->pause_filter_thresh = from->pause_filter_thresh;
 
-	/* Copy asid here because nested_vmcb_check_controls will check it.  */
 	to->asid           = from->asid;
 	to->msrpm_base_pa &= ~0x0fffULL;
 	to->iopm_base_pa  &= ~0x0fffULL;
@@ -508,6 +507,10 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
 	}
 
+	if (svm->nested.ctl.asid != svm->nested.last_asid) {
+		svm->nested.last_asid = svm->nested.ctl.asid;
+		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+	}
 	/*
 	 * TODO: optimize unconditional TLB flush/MMU sync.  A partial list of
 	 * things to fix before this can be conditional:
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index e67e3a64e92f7..0c44133bc05ca 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -212,6 +212,8 @@ struct svm_nested_state {
 	 * on its side.
 	 */
 	bool force_msr_bitmap_recalc;
+
+	u32 last_asid;
 };
 
 struct vcpu_sev_es_state {
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 20/24] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry
  2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
                   ` (18 preceding siblings ...)
  2025-03-26 19:43 ` [RFC PATCH 19/24] KVM: nSVM: Flush the TLB if L1 changes L2's ASID Yosry Ahmed
@ 2025-03-26 19:44 ` Yosry Ahmed
  2025-03-26 19:44   ` [RFC PATCH 21/24] KVM: nSVM: Service local TLB flushes before nested transitions Yosry Ahmed
                     ` (3 more replies)
  19 siblings, 4 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

TLB_CONTROL is reset to TLB_CONTROL_DO_NOTHING on nested transitions to
L2 in nested_vmcb02_prepare_control(). This is unnecessary because:
- TLB_CONTROL is set in vcpu_enter_guest() if needed when servicing a
  TLB flush request (by svm_flush_tlb_asid()).
- TLB_CONTROL is reset to TLB_CONTROL_DO_NOTHING after the guest is run
  in svm_cpu_run().

Hence, at the point where nested_vmcb02_prepare_control() is called,
TLB_CONTROL should have already be set to TLB_CONTROL_DO_NOTHING by
svm_cpu_run() after L1 runs.

There is a TODO in nested_svm_transition_tlb_flush() about this reset
crushing pending TLB flushes. Remove it, as the reset is not really
crushing anything as explained above.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/nested.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index ca8db246ac050..544913461693c 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -511,12 +511,7 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 		svm->nested.last_asid = svm->nested.ctl.asid;
 		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
 	}
-	/*
-	 * TODO: optimize unconditional TLB flush/MMU sync.  A partial list of
-	 * things to fix before this can be conditional:
-	 *
-	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
-	 */
+	/* TODO: optimize unconditional TLB flush/MMU sync */
 	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
 	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
@@ -710,9 +705,6 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 	vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa;
 	vmcb02->control.asid = svm_nested_asid(vcpu->kvm);
 
-	/* Also overwritten later if necessary.  */
-	vmcb_clr_flush_asid(vmcb02);
-
 	/* nested_cr3.  */
 	if (nested_npt_enabled(svm))
 		nested_svm_init_mmu_context(vcpu);
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 21/24] KVM: nSVM: Service local TLB flushes before nested transitions
  2025-03-26 19:44 ` [RFC PATCH 20/24] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry Yosry Ahmed
@ 2025-03-26 19:44   ` Yosry Ahmed
  2025-03-26 19:44   ` [RFC PATCH 22/24] KVM: nSVM: Handle INVLPGA interception correctly Yosry Ahmed
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

KVM does not track TLB flush requests for L1 vs. L2. Hence, service
local flush that target the current context before switching to a new
one. Since the current ASID is identified through the current VMCB
(nested or not), service the flushes before every VMCB switch.

This is conceptually similar to how nVMX calls
kvm_service_local_tlb_flush_requests() in
nested_vmx_enter_non_root_mode() and nested_vmx_vmexit(), with the
following differences:

1. VMX tracks the current VPID based on is_guest_mode(), so local TLB
   flushes are serviced before enter_guest_mode() and
   leave_guest_mode(). On the other hand, SVM tracks the current ASID
   based on the current VMCB, so the TLB flushes are serviced before an
   VMCB switch.

2. nVMX only enters and leaves guest mode in
   nested_vmx_enter_non_root_mode() and nested_vmx_vmexit(). Other paths
   like vmx_set_nested_state() and vmx_leave_nested() call into these
   two functions. On the other hand, nSVM open codes the switch in
   functions like svm_set_nested_state() and svm_leave_nested(), so
   servicing the flush in svm_switch_svm() is probably most reliable.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/svm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3e33ac876eb32..3649707c61d3e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1439,6 +1439,12 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)

 void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb)
 {
+	/*
+	 * The current ASID is identified through the VMCB.  Perform any pending
+	 * TLB flushes for the current VMCB before switching to a new one.
+	 */
+	kvm_service_local_tlb_flush_requests(&svm->vcpu);
+
 	svm->current_vmcb = target_vmcb;
 	svm->vmcb = target_vmcb->ptr;
 }
-- 
2.49.0.395.g12beb8f557-goog

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 22/24] KVM: nSVM: Handle INVLPGA interception correctly
  2025-03-26 19:44 ` [RFC PATCH 20/24] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry Yosry Ahmed
  2025-03-26 19:44   ` [RFC PATCH 21/24] KVM: nSVM: Service local TLB flushes before nested transitions Yosry Ahmed
@ 2025-03-26 19:44   ` Yosry Ahmed
  2025-04-03 20:10     ` Maxim Levitsky
  2025-06-24  1:08     ` Sean Christopherson
  2025-03-26 19:44   ` [RFC PATCH 23/24] KVM: nSVM: Allocate a new ASID for nested guests Yosry Ahmed
  2025-03-26 19:44   ` [RFC PATCH 24/24] KVM: nSVM: Stop bombing the TLB on nested transitions Yosry Ahmed
  3 siblings, 2 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

Currently, INVPLGA interception handles it like INVLPG, which flushes
L1's TLB translations for the address. It was implemented in this way
because L1 and L2 shared an ASID. Now, L1 and L2 have separate ASIDs. It
is still harmless to flush L1's translations, but it's only correct
because all translations are flushed on nested transitions anyway.

In preparation for stopping unconditional flushes on nested transitions,
handle INVPLGA interception properly. If L1 specified zero as the ASID,
this is equivalent to INVLPG, so handle it as such. Otherwise, use
INVPLGA to flush the translations of the appropriate ASID tracked by
KVM, if any. Sync the shadow MMU as well, as L1 invalidated L2's
mappings.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/mmu/mmu.c          |  5 +++--
 arch/x86/kvm/svm/svm.c          | 36 +++++++++++++++++++++++++++++++--
 3 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d881e7d276b12..a158d324168a0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2237,6 +2237,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
 		       void *insn, int insn_len);
 void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg);
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
+void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+			       u64 addr, unsigned long roots, bool gva_flush);
 void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			     u64 addr, unsigned long roots);
 void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e2b1994f12753..d3baa12df84e7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6355,8 +6355,8 @@ static void kvm_mmu_invalidate_addr_in_root(struct kvm_vcpu *vcpu,
 	write_unlock(&vcpu->kvm->mmu_lock);
 }
 
-static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
-				      u64 addr, unsigned long roots, bool gva_flush)
+void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+			       u64 addr, unsigned long roots, bool gva_flush)
 {
 	int i;
 
@@ -6382,6 +6382,7 @@ static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu
 			kvm_mmu_invalidate_addr_in_root(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
 	}
 }
+EXPORT_SYMBOL_GPL(__kvm_mmu_invalidate_addr);
 
 void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			     u64 addr, unsigned long roots)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3649707c61d3e..4b95fd6b501e6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2505,6 +2505,7 @@ static int clgi_interception(struct kvm_vcpu *vcpu)
 
 static int invlpga_interception(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
 	gva_t gva = kvm_rax_read(vcpu);
 	u32 asid = kvm_rcx_read(vcpu);
 
@@ -2514,8 +2515,39 @@ static int invlpga_interception(struct kvm_vcpu *vcpu)
 
 	trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);
 
-	/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
-	kvm_mmu_invlpg(vcpu, gva);
+	/*
+	 * APM is silent about using INVLPGA to flush the host ASID (i.e. 0).
+	 * Do the logical thing and handle it like INVLPG.
+	 */
+	if (asid == 0) {
+		kvm_mmu_invlpg(vcpu, gva);
+		return kvm_skip_emulated_instruction(vcpu);
+	}
+
+	/*
+	 * Check if L1 specified the L2 ASID we are currently tracking. If it
+	 * isn't, do nothing as we have to handle the TLB flush when switching
+	 * to the new ASID anyway.
+	 */
+	if (asid == svm->nested.last_asid)
+		invlpga(gva, svm_nested_asid(vcpu->kvm));
+
+	/*
+	 * If NPT is disabled, sync the shadow page tables as L1 is invalidating
+	 * mappings for L2. Sync all roots as ASIDs are not tracked in the MMU
+	 * role.
+	 *
+	 * As we are not flushing the current context, skip the gva flush from
+	 * __kvm_mmu_invalidate_addr(), it would flush the wrong ASID anyway.
+	 * The correct TLB flush was done above (if needed).
+	 *
+	 * This always operates on root_mmu because L1 and L2 share an MMU when
+	 * NPT is disabled. This can be optimized by invalidating guest roots
+	 * only.
+	 */
+	if (!npt_enabled)
+		__kvm_mmu_invalidate_addr(vcpu, &vcpu->arch.root_mmu, gva,
+					  KVM_MMU_ROOTS_ALL, false);
 
 	return kvm_skip_emulated_instruction(vcpu);
 }
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 23/24] KVM: nSVM: Allocate a new ASID for nested guests
  2025-03-26 19:44 ` [RFC PATCH 20/24] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry Yosry Ahmed
  2025-03-26 19:44   ` [RFC PATCH 21/24] KVM: nSVM: Service local TLB flushes before nested transitions Yosry Ahmed
  2025-03-26 19:44   ` [RFC PATCH 22/24] KVM: nSVM: Handle INVLPGA interception correctly Yosry Ahmed
@ 2025-03-26 19:44   ` Yosry Ahmed
  2025-04-03 20:11     ` Maxim Levitsky
  2025-03-26 19:44   ` [RFC PATCH 24/24] KVM: nSVM: Stop bombing the TLB on nested transitions Yosry Ahmed
  3 siblings, 1 reply; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

Now that nested TLB flushes are properly tracked, start allocating a
separate ASID for nested guests. This allows dropping the unconditional
TLB flushes on nested transitions and doing finer grained TLB flushing
when necessary.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/nested.c | 11 +++++++++--
 arch/x86/kvm/svm/svm.c    |  5 +++--
 arch/x86/kvm/svm/svm.h    |  3 +++
 3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 544913461693c..0c887c91bd50d 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1204,6 +1204,7 @@ int svm_allocate_nested(struct vcpu_svm *svm)
 {
 	struct kvm_svm *kvm_svm = to_kvm_svm(svm->vcpu.kvm);
 	struct page *vmcb02_page;
+	unsigned int asid;
 
 	if (svm->nested.initialized)
 		return 0;
@@ -1221,8 +1222,14 @@ int svm_allocate_nested(struct vcpu_svm *svm)
 
 	svm->nested.initialized = true;
 
-	if (!kvm_svm->nested_asid)
-		kvm_svm->nested_asid = kvm_svm->asid;
+	if (!kvm_svm->nested_asid) {
+		asid = kvm_tlb_tags_alloc(&svm_asids);
+		if (asid && !svm_register_asid(asid)) {
+			kvm_tlb_tags_free(&svm_asids, asid);
+			asid = 0;
+		}
+		kvm_svm->nested_asid = asid ?: fallback_asid;
+	}
 
 	return 0;
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4b95fd6b501e6..196f5bca57a0e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -249,8 +249,8 @@ static unsigned long iopm_base;
 
 DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
 
-static struct kvm_tlb_tags svm_asids;
-static unsigned int fallback_asid;
+struct kvm_tlb_tags svm_asids;
+unsigned int fallback_asid;
 
 /*
  * Only MSR_TSC_AUX is switched via the user return hook.  EFER is switched via
@@ -5127,6 +5127,7 @@ static void svm_vm_destroy(struct kvm *kvm)
 	avic_vm_destroy(kvm);
 	sev_vm_destroy(kvm);
 	kvm_tlb_tags_free(&svm_asids, kvm_svm->asid);
+	kvm_tlb_tags_free(&svm_asids, kvm_svm->nested_asid);
 }
 
 static int svm_vm_init(struct kvm *kvm)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0c44133bc05ca..220d10d2b1a5c 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -630,6 +630,9 @@ static inline void svm_vmgexit_no_action(struct vcpu_svm *svm, u64 data)
 
 extern bool dump_invalid_vmcb;
 
+extern struct kvm_tlb_tags svm_asids;
+extern unsigned int fallback_asid;
+
 u32 svm_msrpm_offset(u32 msr);
 u32 *svm_vcpu_alloc_msrpm(void);
 void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm);
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 24/24] KVM: nSVM: Stop bombing the TLB on nested transitions
  2025-03-26 19:44 ` [RFC PATCH 20/24] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry Yosry Ahmed
                     ` (2 preceding siblings ...)
  2025-03-26 19:44   ` [RFC PATCH 23/24] KVM: nSVM: Allocate a new ASID for nested guests Yosry Ahmed
@ 2025-03-26 19:44   ` Yosry Ahmed
  3 siblings, 0 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-26 19:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed

Now that nested TLB flushes are properly tracked with a well-maintained
separate ASID for L2 and proper handling of L1's TLB flush requests,
drop the unconditional flushes and syncs on nested transitions.

On a Milan machine, an L1 and L2 guests were booted, both with a single
vCPU, and pinned to a single physical CPU to maximize TLB collisions. In
this setup, the cpuid_rate microbenchmark [1] showed the following
changes with this patch:

+--------+--------+-------------------+----------------------+
| L0     | L1     | cpuid_rate (base) | cpuid_rate (patched) |
+========+========+===================+======================+
| NPT    | NPT    | 256621            | 301113 (+17.3%)      |
| NPT    | Shadow | 180017            | 203347 (+12.96%)     |
| Shadow | Shadow | 177006            | 189150 (+6.86%)      |
+--------+--------+-------------------+----------------------+

[1]https://lore.kernel.org/kvm/20231109180646.2963718-1-khorenko@virtuozzo.com/

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/nested.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 0c887c91bd50d..1cc281a7b666c 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -511,9 +511,6 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 		svm->nested.last_asid = svm->nested.ctl.asid;
 		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
 	}
-	/* TODO: optimize unconditional TLB flush/MMU sync */
-	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
-	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
 
 /* See nested_svm_entry_tlb_flush() */
@@ -525,9 +522,6 @@ static void nested_svm_exit_tlb_flush(struct kvm_vcpu *vcpu)
 
 	if (svm->nested.ctl.tlb_ctl == TLB_CONTROL_FLUSH_ALL_ASID)
 		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
-
-	kvm_make_request(KVM_REQ_MMU_SYNC, vcpu);
-	kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 }
 
 /*
-- 
2.49.0.395.g12beb8f557-goog


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 01/24] KVM: VMX: Generalize VPID allocation to be vendor-neutral
  2025-03-26 19:35 ` [RFC PATCH 01/24] KVM: VMX: Generalize VPID allocation to be vendor-neutral Yosry Ahmed
@ 2025-03-27 10:58   ` Nikunj A Dadhania
  2025-03-27 17:13     ` Yosry Ahmed
  2025-06-23 16:44   ` Sean Christopherson
  1 sibling, 1 reply; 58+ messages in thread
From: Nikunj A Dadhania @ 2025-03-27 10:58 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel, Yosry Ahmed,
	Manali Shukla, santosh.shukla

Yosry Ahmed <yosry.ahmed@linux.dev> writes:

> Generalize the VMX VPID allocation code and make move it to common
> code

s/make//

> in preparation for sharing with SVM. Create a generic struct
> kvm_tlb_tags, representing a factory for VPIDs (or ASIDs later), and use
> one for VPIDs.
>
> Most of the functionality remains the same, with the following
> differences:
> - The enable_vpid checks are moved to the callers for allocate_vpid()
>   and free_vpid(), as they are specific to VMX.
> - The bitmap allocation is now dynamic (which will be required for SVM),
>   so it is initialized and cleaned up in vmx_hardware_{setup/unsetup}().
> - The range of valid TLB tags is expressed in terms of min/max instead
>   of the number of tags to support SVM use cases.
>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/vmx/nested.c |  4 +--
>  arch/x86/kvm/vmx/vmx.c    | 38 +++++--------------------
>  arch/x86/kvm/vmx/vmx.h    |  4 +--
>  arch/x86/kvm/x86.c        | 58 +++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.h        | 13 +++++++++
>  5 files changed, 82 insertions(+), 35 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index d06e50d9c0e79..b017bd2eb2382 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -343,7 +343,7 @@ static void free_nested(struct kvm_vcpu *vcpu)
>  	vmx->nested.vmxon = false;
>  	vmx->nested.smm.vmxon = false;
>  	vmx->nested.vmxon_ptr = INVALID_GPA;
> -	free_vpid(vmx->nested.vpid02);
> +	kvm_tlb_tags_free(&vmx_vpids, vmx->nested.vpid02);
>  	vmx->nested.posted_intr_nv = -1;
>  	vmx->nested.current_vmptr = INVALID_GPA;
>  	if (enable_shadow_vmcs) {
> @@ -5333,7 +5333,7 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu)
>  		     HRTIMER_MODE_ABS_PINNED);
>  	vmx->nested.preemption_timer.function = vmx_preemption_timer_fn;
>  
> -	vmx->nested.vpid02 = allocate_vpid();
> +	vmx->nested.vpid02 = enable_vpid ? kvm_tlb_tags_alloc(&vmx_vpids) : 0;
>  
>  	vmx->nested.vmcs02_initialized = false;
>  	vmx->nested.vmxon = true;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index b70ed72c1783d..f7ce75842fa26 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -496,8 +496,7 @@ DEFINE_PER_CPU(struct vmcs *, current_vmcs);
>   */
>  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
>  
> -static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
> -static DEFINE_SPINLOCK(vmx_vpid_lock);
> +struct kvm_tlb_tags vmx_vpids;
>  
>  struct vmcs_config vmcs_config __ro_after_init;
>  struct vmx_capability vmx_capability __ro_after_init;
> @@ -3972,31 +3971,6 @@ static void seg_setup(int seg)
>  	vmcs_write32(sf->ar_bytes, ar);
>  }
>  
> -int allocate_vpid(void)
> -{
> -	int vpid;
> -
> -	if (!enable_vpid)
> -		return 0;
> -	spin_lock(&vmx_vpid_lock);
> -	vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
> -	if (vpid < VMX_NR_VPIDS)
> -		__set_bit(vpid, vmx_vpid_bitmap);
> -	else
> -		vpid = 0;
> -	spin_unlock(&vmx_vpid_lock);
> -	return vpid;
> -}
> -
> -void free_vpid(int vpid)
> -{
> -	if (!enable_vpid || vpid == 0)
> -		return;
> -	spin_lock(&vmx_vpid_lock);
> -	__clear_bit(vpid, vmx_vpid_bitmap);
> -	spin_unlock(&vmx_vpid_lock);
> -}
> -
>  static void vmx_msr_bitmap_l01_changed(struct vcpu_vmx *vmx)
>  {
>  	/*
> @@ -7559,7 +7533,7 @@ void vmx_vcpu_free(struct kvm_vcpu *vcpu)
>  
>  	if (enable_pml)
>  		vmx_destroy_pml_buffer(vmx);
> -	free_vpid(vmx->vpid);
> +	kvm_tlb_tags_free(&vmx_vpids, vmx->vpid);
>  	nested_vmx_free_vcpu(vcpu);
>  	free_loaded_vmcs(vmx->loaded_vmcs);
>  	free_page((unsigned long)vmx->ve_info);
> @@ -7578,7 +7552,7 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
>  
>  	err = -ENOMEM;
>  
> -	vmx->vpid = allocate_vpid();
> +	vmx->vpid = enable_vpid ? kvm_tlb_tags_alloc(&vmx_vpids) : 0;
>  
>  	/*
>  	 * If PML is turned on, failure on enabling PML just results in failure
> @@ -7681,7 +7655,7 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
>  free_pml:
>  	vmx_destroy_pml_buffer(vmx);
>  free_vpid:
> -	free_vpid(vmx->vpid);
> +	kvm_tlb_tags_free(&vmx_vpids, vmx->vpid);
>  	return err;
>  }
>  
> @@ -8373,6 +8347,7 @@ void vmx_hardware_unsetup(void)
>  		nested_vmx_hardware_unsetup();
>  
>  	free_kvm_area();
> +	kvm_tlb_tags_destroy(&vmx_vpids);
>  }
>  
>  void vmx_vm_destroy(struct kvm *kvm)
> @@ -8591,7 +8566,8 @@ __init int vmx_hardware_setup(void)
>  	kvm_caps.has_bus_lock_exit = cpu_has_vmx_bus_lock_detection();
>  	kvm_caps.has_notify_vmexit = cpu_has_notify_vmexit();
>  
> -	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
> +	/* VPID 0 is reserved for host, so min=1  */
> +	kvm_tlb_tags_init(&vmx_vpids, 1, VMX_NR_VPIDS - 1);

This needs to handle errors from kvm_tlb_tags_init().

>  
>  	if (enable_ept)
>  		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 951e44dc9d0ea..9bece3ea63eaa 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -376,10 +376,10 @@ struct kvm_vmx {
>  	u64 *pid_table;
>  };
>  
> +extern struct kvm_tlb_tags vmx_vpids;
> +
>  void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
>  			struct loaded_vmcs *buddy);
> -int allocate_vpid(void);
> -void free_vpid(int vpid);
>  void vmx_set_constant_host_state(struct vcpu_vmx *vmx);
>  void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
>  void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 69c20a68a3f01..182f18ebc62f3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13992,6 +13992,64 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
>  }
>  EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
>  
> +int kvm_tlb_tags_init(struct kvm_tlb_tags *tlb_tags, unsigned int min,
> +		      unsigned int max)
> +{
> +	/*
> +	 * 0 is assumed to be the host's TLB tag and is returned on failed
> +	 * allocations.
> +	 */
> +	if (WARN_ON_ONCE(min == 0))
> +		return -1;

Probably -EINVAL ?

> +
> +	/*
> +	 * Allocate enough bits to index the bitmap directly by the tag,
> +	 * potentially wasting a bit of memory.
> +	 */
> +	tlb_tags->bitmap = bitmap_zalloc(max + 1, GFP_KERNEL);
> +	if (!tlb_tags->bitmap)
> +		return -1;

-ENOMEM ?

> +
> +	tlb_tags->min = min;
> +	tlb_tags->max = max;
> +	spin_lock_init(&tlb_tags->lock);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(kvm_tlb_tags_init);
> +
> +void kvm_tlb_tags_destroy(struct kvm_tlb_tags *tlb_tags)
> +{
> +	bitmap_free(tlb_tags->bitmap);

Do we need to take tlb_tabs->lock here ?

> +}
> +EXPORT_SYMBOL_GPL(kvm_tlb_tags_destroy);
> +
> +unsigned int kvm_tlb_tags_alloc(struct kvm_tlb_tags *tlb_tags)
> +{
> +	unsigned int tag;
> +
> +	spin_lock(&tlb_tags->lock);
> +	tag = find_next_zero_bit(tlb_tags->bitmap, tlb_tags->max + 1,
> +				 tlb_tags->min);
> +	if (tag <= tlb_tags->max)
> +		__set_bit(tag, tlb_tags->bitmap);
> +	else
> +		tag = 0;

In the event that the KVM runs out of tags, adding WARN_ON_ONCE() here will
help debugging.

Regards
Nikunj


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 01/24] KVM: VMX: Generalize VPID allocation to be vendor-neutral
  2025-03-27 10:58   ` Nikunj A Dadhania
@ 2025-03-27 17:13     ` Yosry Ahmed
  2025-03-27 19:42       ` Sean Christopherson
  0 siblings, 1 reply; 58+ messages in thread
From: Yosry Ahmed @ 2025-03-27 17:13 UTC (permalink / raw)
  To: Nikunj A Dadhania
  Cc: Sean Christopherson, Paolo Bonzini, Jim Mattson, Maxim Levitsky,
	Vitaly Kuznetsov, Rik van Riel, Tom Lendacky, x86, kvm,
	linux-kernel, Manali Shukla, santosh.shukla

On Thu, Mar 27, 2025 at 10:58:31AM +0000, Nikunj A Dadhania wrote:
> Yosry Ahmed <yosry.ahmed@linux.dev> writes:
> 
> > Generalize the VMX VPID allocation code and make move it to common
> > code
> 
> s/make//
> 
> > in preparation for sharing with SVM. Create a generic struct
> > kvm_tlb_tags, representing a factory for VPIDs (or ASIDs later), and use
> > one for VPIDs.
> >
> > Most of the functionality remains the same, with the following
> > differences:
> > - The enable_vpid checks are moved to the callers for allocate_vpid()
> >   and free_vpid(), as they are specific to VMX.
> > - The bitmap allocation is now dynamic (which will be required for SVM),
> >   so it is initialized and cleaned up in vmx_hardware_{setup/unsetup}().
> > - The range of valid TLB tags is expressed in terms of min/max instead
> >   of the number of tags to support SVM use cases.
> >
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> >  arch/x86/kvm/vmx/nested.c |  4 +--
> >  arch/x86/kvm/vmx/vmx.c    | 38 +++++--------------------
> >  arch/x86/kvm/vmx/vmx.h    |  4 +--
> >  arch/x86/kvm/x86.c        | 58 +++++++++++++++++++++++++++++++++++++++
> >  arch/x86/kvm/x86.h        | 13 +++++++++
> >  5 files changed, 82 insertions(+), 35 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index d06e50d9c0e79..b017bd2eb2382 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -343,7 +343,7 @@ static void free_nested(struct kvm_vcpu *vcpu)
> >  	vmx->nested.vmxon = false;
> >  	vmx->nested.smm.vmxon = false;
> >  	vmx->nested.vmxon_ptr = INVALID_GPA;
> > -	free_vpid(vmx->nested.vpid02);
> > +	kvm_tlb_tags_free(&vmx_vpids, vmx->nested.vpid02);
> >  	vmx->nested.posted_intr_nv = -1;
> >  	vmx->nested.current_vmptr = INVALID_GPA;
> >  	if (enable_shadow_vmcs) {
> > @@ -5333,7 +5333,7 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu)
> >  		     HRTIMER_MODE_ABS_PINNED);
> >  	vmx->nested.preemption_timer.function = vmx_preemption_timer_fn;
> >  
> > -	vmx->nested.vpid02 = allocate_vpid();
> > +	vmx->nested.vpid02 = enable_vpid ? kvm_tlb_tags_alloc(&vmx_vpids) : 0;
> >  
> >  	vmx->nested.vmcs02_initialized = false;
> >  	vmx->nested.vmxon = true;
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index b70ed72c1783d..f7ce75842fa26 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -496,8 +496,7 @@ DEFINE_PER_CPU(struct vmcs *, current_vmcs);
> >   */
> >  static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
> >  
> > -static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
> > -static DEFINE_SPINLOCK(vmx_vpid_lock);
> > +struct kvm_tlb_tags vmx_vpids;
> >  
> >  struct vmcs_config vmcs_config __ro_after_init;
> >  struct vmx_capability vmx_capability __ro_after_init;
> > @@ -3972,31 +3971,6 @@ static void seg_setup(int seg)
> >  	vmcs_write32(sf->ar_bytes, ar);
> >  }
> >  
> > -int allocate_vpid(void)
> > -{
> > -	int vpid;
> > -
> > -	if (!enable_vpid)
> > -		return 0;
> > -	spin_lock(&vmx_vpid_lock);
> > -	vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
> > -	if (vpid < VMX_NR_VPIDS)
> > -		__set_bit(vpid, vmx_vpid_bitmap);
> > -	else
> > -		vpid = 0;
> > -	spin_unlock(&vmx_vpid_lock);
> > -	return vpid;
> > -}
> > -
> > -void free_vpid(int vpid)
> > -{
> > -	if (!enable_vpid || vpid == 0)
> > -		return;
> > -	spin_lock(&vmx_vpid_lock);
> > -	__clear_bit(vpid, vmx_vpid_bitmap);
> > -	spin_unlock(&vmx_vpid_lock);
> > -}
> > -
> >  static void vmx_msr_bitmap_l01_changed(struct vcpu_vmx *vmx)
> >  {
> >  	/*
> > @@ -7559,7 +7533,7 @@ void vmx_vcpu_free(struct kvm_vcpu *vcpu)
> >  
> >  	if (enable_pml)
> >  		vmx_destroy_pml_buffer(vmx);
> > -	free_vpid(vmx->vpid);
> > +	kvm_tlb_tags_free(&vmx_vpids, vmx->vpid);
> >  	nested_vmx_free_vcpu(vcpu);
> >  	free_loaded_vmcs(vmx->loaded_vmcs);
> >  	free_page((unsigned long)vmx->ve_info);
> > @@ -7578,7 +7552,7 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
> >  
> >  	err = -ENOMEM;
> >  
> > -	vmx->vpid = allocate_vpid();
> > +	vmx->vpid = enable_vpid ? kvm_tlb_tags_alloc(&vmx_vpids) : 0;
> >  
> >  	/*
> >  	 * If PML is turned on, failure on enabling PML just results in failure
> > @@ -7681,7 +7655,7 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
> >  free_pml:
> >  	vmx_destroy_pml_buffer(vmx);
> >  free_vpid:
> > -	free_vpid(vmx->vpid);
> > +	kvm_tlb_tags_free(&vmx_vpids, vmx->vpid);
> >  	return err;
> >  }
> >  
> > @@ -8373,6 +8347,7 @@ void vmx_hardware_unsetup(void)
> >  		nested_vmx_hardware_unsetup();
> >  
> >  	free_kvm_area();
> > +	kvm_tlb_tags_destroy(&vmx_vpids);
> >  }
> >  
> >  void vmx_vm_destroy(struct kvm *kvm)
> > @@ -8591,7 +8566,8 @@ __init int vmx_hardware_setup(void)
> >  	kvm_caps.has_bus_lock_exit = cpu_has_vmx_bus_lock_detection();
> >  	kvm_caps.has_notify_vmexit = cpu_has_notify_vmexit();
> >  
> > -	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
> > +	/* VPID 0 is reserved for host, so min=1  */
> > +	kvm_tlb_tags_init(&vmx_vpids, 1, VMX_NR_VPIDS - 1);
> 
> This needs to handle errors from kvm_tlb_tags_init().
> 
> >  
> >  	if (enable_ept)
> >  		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
> > diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> > index 951e44dc9d0ea..9bece3ea63eaa 100644
> > --- a/arch/x86/kvm/vmx/vmx.h
> > +++ b/arch/x86/kvm/vmx/vmx.h
> > @@ -376,10 +376,10 @@ struct kvm_vmx {
> >  	u64 *pid_table;
> >  };
> >  
> > +extern struct kvm_tlb_tags vmx_vpids;
> > +
> >  void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
> >  			struct loaded_vmcs *buddy);
> > -int allocate_vpid(void);
> > -void free_vpid(int vpid);
> >  void vmx_set_constant_host_state(struct vcpu_vmx *vmx);
> >  void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
> >  void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 69c20a68a3f01..182f18ebc62f3 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -13992,6 +13992,64 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
> >  
> > +int kvm_tlb_tags_init(struct kvm_tlb_tags *tlb_tags, unsigned int min,
> > +		      unsigned int max)
> > +{
> > +	/*
> > +	 * 0 is assumed to be the host's TLB tag and is returned on failed
> > +	 * allocations.
> > +	 */
> > +	if (WARN_ON_ONCE(min == 0))
> > +		return -1;
> 
> Probably -EINVAL ?

Yeah we can use error codes for clarity, thanks.

> 
> > +
> > +	/*
> > +	 * Allocate enough bits to index the bitmap directly by the tag,
> > +	 * potentially wasting a bit of memory.
> > +	 */
> > +	tlb_tags->bitmap = bitmap_zalloc(max + 1, GFP_KERNEL);
> > +	if (!tlb_tags->bitmap)
> > +		return -1;
> 
> -ENOMEM ?
> 
> > +
> > +	tlb_tags->min = min;
> > +	tlb_tags->max = max;
> > +	spin_lock_init(&tlb_tags->lock);
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_tlb_tags_init);
> > +
> > +void kvm_tlb_tags_destroy(struct kvm_tlb_tags *tlb_tags)
> > +{
> > +	bitmap_free(tlb_tags->bitmap);
> 
> Do we need to take tlb_tabs->lock here ?

Hmm we could, but I think it's a bug from the caller if they allow
kvm_tlb_tags_destroy() and any of the other functions (init/alloc/free)
to race. kvm_tlb_tags_destroy() should be called when we are done using
the factory.

> 
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_tlb_tags_destroy);
> > +
> > +unsigned int kvm_tlb_tags_alloc(struct kvm_tlb_tags *tlb_tags)
> > +{
> > +	unsigned int tag;
> > +
> > +	spin_lock(&tlb_tags->lock);
> > +	tag = find_next_zero_bit(tlb_tags->bitmap, tlb_tags->max + 1,
> > +				 tlb_tags->min);
> > +	if (tag <= tlb_tags->max)
> > +		__set_bit(tag, tlb_tags->bitmap);
> > +	else
> > +		tag = 0;
> 
> In the event that the KVM runs out of tags, adding WARN_ON_ONCE() here will
> help debugging.

Yeah I wanted to do that, but we do not currently WARN in VMX if we run
out of VPIDs. I am fine with doing adding it if others are. My main
concern was if there's some existing use case that routinely runs out of
VPIDs (although I cannot imagine one).

> 
> Regards
> Nikunj
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 01/24] KVM: VMX: Generalize VPID allocation to be vendor-neutral
  2025-03-27 17:13     ` Yosry Ahmed
@ 2025-03-27 19:42       ` Sean Christopherson
  0 siblings, 0 replies; 58+ messages in thread
From: Sean Christopherson @ 2025-03-27 19:42 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Nikunj A Dadhania, Paolo Bonzini, Jim Mattson, Maxim Levitsky,
	Vitaly Kuznetsov, Rik van Riel, Tom Lendacky, x86, kvm,
	linux-kernel, Manali Shukla, santosh.shukla

On Thu, Mar 27, 2025, Yosry Ahmed wrote:
> On Thu, Mar 27, 2025 at 10:58:31AM +0000, Nikunj A Dadhania wrote:
> > > +unsigned int kvm_tlb_tags_alloc(struct kvm_tlb_tags *tlb_tags)
> > > +{
> > > +	unsigned int tag;
> > > +
> > > +	spin_lock(&tlb_tags->lock);
> > > +	tag = find_next_zero_bit(tlb_tags->bitmap, tlb_tags->max + 1,
> > > +				 tlb_tags->min);
> > > +	if (tag <= tlb_tags->max)
> > > +		__set_bit(tag, tlb_tags->bitmap);
> > > +	else
> > > +		tag = 0;
> > 
> > In the event that the KVM runs out of tags, adding WARN_ON_ONCE() here will
> > help debugging.
> 
> Yeah I wanted to do that, but we do not currently WARN in VMX if we run
> out of VPIDs. I am fine with doing adding it if others are. My main
> concern was if there's some existing use case that routinely runs out of
> VPIDs (although I cannot imagine one).

No WARNs, it would be userspace triggerable (hello, syzkaller).  If we really
want to harden things against performance issues due to unexpected VPID/ASID
allocation, I would rather do something like add a knob to fail VM or vCPU
creation if allocation fails (nested would just have to suffer).

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 02/24] KVM: SVM: Use cached local variable in init_vmcb()
  2025-03-26 19:35 ` [RFC PATCH 02/24] KVM: SVM: Use cached local variable in init_vmcb() Yosry Ahmed
@ 2025-04-03 19:56   ` Maxim Levitsky
  0 siblings, 0 replies; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 19:56 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:35 +0000, Yosry Ahmed wrote:
> svm->vmcb->control is already cached in the 'control' local variable, so
> use that.

Microscopic nitpick: I usually mention that 'No functional change intended'.

> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/svm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 8abeab91d329d..28a6d2c0f250f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1367,12 +1367,12 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>  		avic_init_vmcb(svm, vmcb);
>  
>  	if (vnmi)
> -		svm->vmcb->control.int_ctl |= V_NMI_ENABLE_MASK;
> +		control->int_ctl |= V_NMI_ENABLE_MASK;
>  
>  	if (vgif) {
>  		svm_clr_intercept(svm, INTERCEPT_STGI);
>  		svm_clr_intercept(svm, INTERCEPT_CLGI);
> -		svm->vmcb->control.int_ctl |= V_GIF_ENABLE_MASK;
> +		control->int_ctl |= V_GIF_ENABLE_MASK;
>  	}
>  
>  	if (sev_guest(vcpu->kvm))

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 03/24] KVM: SVM: Add helpers to set/clear ASID flush in VMCB
  2025-03-26 19:35 ` [RFC PATCH 03/24] KVM: SVM: Add helpers to set/clear ASID flush in VMCB Yosry Ahmed
@ 2025-04-03 20:00   ` Maxim Levitsky
  2025-06-23 16:46   ` Sean Christopherson
  1 sibling, 0 replies; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:00 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:35 +0000, Yosry Ahmed wrote:
> Incoming changes will add more code paths that set tlb_ctl to
> TLB_CONTROL_FLUSH_ASID, and will eliminate the use of
> TLB_CONTROL_FLUSH_ALL_ASID except as fallback when FLUSHBYASID is not
> available. Introduce set/clear helpers to set tlb_ctl to
> TLB_CONTROL_FLUSH_ASID or TLB_CONTROL_DO_NOTHING.
> 
> Opportunistically move the TLB_CONTROL_* definitions to
> arch/x86/kvm/svm/svm.h as they are not used outside of arch/x86/kvm/svm/.

Same microscopic nitpick as in previous patch :) 
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/include/asm/svm.h |  5 -----
>  arch/x86/kvm/svm/nested.c  |  2 +-
>  arch/x86/kvm/svm/sev.c     |  2 +-
>  arch/x86/kvm/svm/svm.c     |  4 ++--
>  arch/x86/kvm/svm/svm.h     | 15 +++++++++++++++
>  5 files changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 9b7fa99ae9513..a97da63562eb3 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -171,11 +171,6 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  };
>  
>  
> -#define TLB_CONTROL_DO_NOTHING 0
> -#define TLB_CONTROL_FLUSH_ALL_ASID 1
> -#define TLB_CONTROL_FLUSH_ASID 3
> -#define TLB_CONTROL_FLUSH_ASID_LOCAL 7
> -
>  #define V_TPR_MASK 0x0f
>  
>  #define V_IRQ_SHIFT 8
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 834b67672d50f..11b02a0340d9e 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -681,7 +681,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>  	/* Done at vmrun: asid.  */
>  
>  	/* Also overwritten later if necessary.  */
> -	vmcb02->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
> +	vmcb_clr_flush_asid(vmcb02);
>  
>  	/* nested_cr3.  */
>  	if (nested_npt_enabled(svm))
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 0bc708ee27887..d613f81addf1c 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3479,7 +3479,7 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
>  		return 0;
>  
>  	sd->sev_vmcbs[asid] = svm->vmcb;
> -	svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
> +	vmcb_set_flush_asid(svm->vmcb);
>  	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
>  	return 0;
>  }
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 28a6d2c0f250f..0e302ae9a8435 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4006,7 +4006,7 @@ static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
>  	 * VM-Exit (via kvm_mmu_reset_context()).
>  	 */
>  	if (static_cpu_has(X86_FEATURE_FLUSHBYASID))
> -		svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
> +		vmcb_set_flush_asid(svm->vmcb);
>  	else
>  		svm->current_vmcb->asid_generation--;
>  }
> @@ -4373,7 +4373,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
>  		svm->nested.nested_run_pending = 0;
>  	}
>  
> -	svm->vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
> +	vmcb_clr_flush_asid(svm->vmcb);
>  	vmcb_mark_all_clean(svm->vmcb);
>  
>  	/* if exit due to PF check for async PF */
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index d4490eaed55dd..d2c49cbfbf1ca 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -409,6 +409,21 @@ static inline bool vmcb_is_dirty(struct vmcb *vmcb, int bit)
>          return !test_bit(bit, (unsigned long *)&vmcb->control.clean);
>  }
>  
> +#define TLB_CONTROL_DO_NOTHING 0
> +#define TLB_CONTROL_FLUSH_ALL_ASID 1
> +#define TLB_CONTROL_FLUSH_ASID 3
> +#define TLB_CONTROL_FLUSH_ASID_LOCAL 7
> +
> +static inline void vmcb_set_flush_asid(struct vmcb *vmcb)
> +{
> +	vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
> +}
> +
> +static inline void vmcb_clr_flush_asid(struct vmcb *vmcb)
> +{
> +	vmcb->control.tlb_ctl = TLB_CONTROL_DO_NOTHING;
> +}
> +
>  static __always_inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
>  {
>  	return container_of(vcpu, struct vcpu_svm, vcpu);


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 04/24] KVM: SVM: Flush everything if FLUSHBYASID is not available
  2025-03-26 19:35 ` [RFC PATCH 04/24] KVM: SVM: Flush everything if FLUSHBYASID is not available Yosry Ahmed
@ 2025-04-03 20:00   ` Maxim Levitsky
  0 siblings, 0 replies; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:00 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:35 +0000, Yosry Ahmed wrote:
> Currently, if FLUSHBYASID is not available when performing a TLB flush,
> the fallback is decrementing the ASID generation to trigger allocating a
> new ASID. In preparation for using a static ASID per VM, just fallback
> to flushing everything if FLUSHBYASID is not available. This is probably
> worse from a performance perspective, but FLUSHBYASID has been around
> for ~15 years and it's not worth carrying the complexity.
> 
> The fallback logic is moved within vmcb_set_flush_asid(), as more
> callers will be added and will need the fallback as well.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/svm.c | 5 +----
>  arch/x86/kvm/svm/svm.h | 5 ++++-
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 0e302ae9a8435..5f71b125010d9 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4005,10 +4005,7 @@ static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
>  	 * unconditionally does a TLB flush on both nested VM-Enter and nested
>  	 * VM-Exit (via kvm_mmu_reset_context()).
>  	 */
> -	if (static_cpu_has(X86_FEATURE_FLUSHBYASID))
> -		vmcb_set_flush_asid(svm->vmcb);
> -	else
> -		svm->current_vmcb->asid_generation--;
> +	vmcb_set_flush_asid(svm->vmcb);
>  }
>  
>  static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index d2c49cbfbf1ca..843a29a6d150e 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -416,7 +416,10 @@ static inline bool vmcb_is_dirty(struct vmcb *vmcb, int bit)
>  
>  static inline void vmcb_set_flush_asid(struct vmcb *vmcb)
>  {
> -	vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
> +	if (static_cpu_has(X86_FEATURE_FLUSHBYASID))
> +		vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
> +	else
> +		vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ALL_ASID;
>  }
>  
>  static inline void vmcb_clr_flush_asid(struct vmcb *vmcb)

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 05/24] KVM: SVM: Flush the ASID when running on a new CPU
  2025-03-26 19:36 ` [RFC PATCH 05/24] KVM: SVM: Flush the ASID when running on a new CPU Yosry Ahmed
@ 2025-04-03 20:00   ` Maxim Levitsky
  0 siblings, 0 replies; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:00 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> Currently, when a vCPU is migrated to a new physical CPU, the ASID
> generation is reset to trigger allocating a new ASID. In preparation for
> using a static ASID per VM, just flush the ASID in this case (falling
> back to flushing everything if FLUSBYASID is not available).
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/svm.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 5f71b125010d9..18bfc3d3f9ba1 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3626,12 +3626,12 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
>  	/*
> -	 * If the previous vmrun of the vmcb occurred on a different physical
> -	 * cpu, then mark the vmcb dirty and assign a new asid.  Hardware's
> -	 * vmcb clean bits are per logical CPU, as are KVM's asid assignments.
> +	 * If the previous VMRUN of the VMCB occurred on a different physical
> +	 * CPU, then mark the VMCB dirty and flush the ASID.  Hardware's
> +	 * VMCB clean bits are per logical CPU, as are KVM's ASID assignments.
>  	 */
>  	if (unlikely(svm->current_vmcb->cpu != vcpu->cpu)) {
> -		svm->current_vmcb->asid_generation = 0;
> +		vmcb_set_flush_asid(svm->vmcb);
>  		vmcb_mark_all_dirty(svm->vmcb);
>  		svm->current_vmcb->cpu = vcpu->cpu;
>          }

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 06/24] KVM: SEV: Track ASID->vCPU instead of ASID->VMCB
  2025-03-26 19:36 ` [RFC PATCH 06/24] KVM: SEV: Track ASID->vCPU instead of ASID->VMCB Yosry Ahmed
@ 2025-04-03 20:04   ` Maxim Levitsky
  2025-04-22  9:41     ` Yosry Ahmed
  2025-06-20 23:13   ` Sean Christopherson
  1 sibling, 1 reply; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:04 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> SEV currently tracks the ASID to VMCB mapping for each physical CPU.
> This is required to flush the ASID when a new VMCB using the same ASID
> is run on the same CPU. 


> Practically, there is a single VMCB for each
> vCPU using SEV. 

Can you elaborate on this a bit? AFAIK you can't run nested with SEV,
even plain SEV because guest state is encrypted, so for SEV we have
indeed one VMCB per vCPU.

> Furthermore, TLB flushes on nested transitions between
> VMCB01 and VMCB02 are handled separately (see
> nested_svm_transition_tlb_flush()).

Yes, or we can say that for now both VMCBs share the same ASID,
up until later in this patch series.

> 
> In preparation for generalizing the tracking and making the tracking
> more expensive, start tracking the ASID to vCPU mapping instead. This
> will allow for the tracking to be moved to a cheaper code path when
> vCPUs are switched.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/sev.c | 12 ++++++------
>  arch/x86/kvm/svm/svm.c |  2 +-
>  arch/x86/kvm/svm/svm.h |  4 ++--
>  3 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index d613f81addf1c..ddb4d5b211ed7 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -240,7 +240,7 @@ static void sev_asid_free(struct kvm_sev_info *sev)
>  
>  	for_each_possible_cpu(cpu) {
>  		sd = per_cpu_ptr(&svm_data, cpu);
> -		sd->sev_vmcbs[sev->asid] = NULL;
> +		sd->sev_vcpus[sev->asid] = NULL;
>  	}
>  
>  	mutex_unlock(&sev_bitmap_lock);
> @@ -3081,8 +3081,8 @@ int sev_cpu_init(struct svm_cpu_data *sd)
>  	if (!sev_enabled)
>  		return 0;
>  
> -	sd->sev_vmcbs = kcalloc(nr_asids, sizeof(void *), GFP_KERNEL);
> -	if (!sd->sev_vmcbs)
> +	sd->sev_vcpus = kcalloc(nr_asids, sizeof(void *), GFP_KERNEL);
> +	if (!sd->sev_vcpus)
>  		return -ENOMEM;
>  
>  	return 0;
> @@ -3471,14 +3471,14 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
>  	/*
>  	 * Flush guest TLB:
>  	 *
> -	 * 1) when different VMCB for the same ASID is to be run on the same host CPU.
> +	 * 1) when different vCPU for the same ASID is to be run on the same host CPU.
>  	 * 2) or this VMCB was executed on different host CPU in previous VMRUNs.
>  	 */
> -	if (sd->sev_vmcbs[asid] == svm->vmcb &&
> +	if (sd->sev_vcpus[asid] == &svm->vcpu &&
>  	    svm->vcpu.arch.last_vmentry_cpu == cpu)
>  		return 0;
>  
> -	sd->sev_vmcbs[asid] = svm->vmcb;
> +	sd->sev_vcpus[asid] = &svm->vcpu;
>  	vmcb_set_flush_asid(svm->vmcb);
>  	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
>  	return 0;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 18bfc3d3f9ba1..1156ca97fd798 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -694,7 +694,7 @@ static void svm_cpu_uninit(int cpu)
>  	if (!sd->save_area)
>  		return;
>  
> -	kfree(sd->sev_vmcbs);
> +	kfree(sd->sev_vcpus);
>  	__free_page(__sme_pa_to_page(sd->save_area_pa));
>  	sd->save_area_pa = 0;
>  	sd->save_area = NULL;
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 843a29a6d150e..4ea6c61c3b048 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -340,8 +340,8 @@ struct svm_cpu_data {
>  
>  	struct vmcb *current_vmcb;
>  
> -	/* index = sev_asid, value = vmcb pointer */
> -	struct vmcb **sev_vmcbs;
> +	/* index = sev_asid, value = vcpu pointer */
> +	struct kvm_vcpu **sev_vcpus;
>  };
>  
>  DECLARE_PER_CPU(struct svm_cpu_data, svm_data);


Code itself looks OK, so 

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky





^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 07/24] KVM: SEV: Track ASID->vCPU on vCPU load
  2025-03-26 19:36 ` [RFC PATCH 07/24] KVM: SEV: Track ASID->vCPU on vCPU load Yosry Ahmed
@ 2025-04-03 20:04   ` Maxim Levitsky
  0 siblings, 0 replies; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:04 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> Check for changes in the ASID to vCPU mapping on vCPU load instead of
> doing it on vCPU run. This should be sufficient and more efficient, and
> is needed to allow generalizing the tracking and making it more
> expensive.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/sev.c | 13 ++++---------
>  arch/x86/kvm/svm/svm.c | 13 +++++++++++++
>  arch/x86/kvm/svm/svm.h |  1 +
>  3 files changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index ddb4d5b211ed7..3ef0dfdbb34d2 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -224,7 +224,7 @@ static int sev_asid_new(struct kvm_sev_info *sev)
>  	return ret;
>  }
>  
> -static unsigned int sev_get_asid(struct kvm *kvm)
> +unsigned int sev_get_asid(struct kvm *kvm)
>  {
>  	return to_kvm_sev_info(kvm)->asid;
>  }
> @@ -3453,7 +3453,6 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm)
>  
>  int pre_sev_run(struct vcpu_svm *svm, int cpu)
>  {
> -	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu);
>  	struct kvm *kvm = svm->vcpu.kvm;
>  	unsigned int asid = sev_get_asid(kvm);
>  
> @@ -3469,16 +3468,12 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
>  	svm->asid = asid;
>  
>  	/*
> -	 * Flush guest TLB:
> -	 *
> -	 * 1) when different vCPU for the same ASID is to be run on the same host CPU.
> -	 * 2) or this VMCB was executed on different host CPU in previous VMRUNs.
> +	 * Flush guest TLB if the VMCB was executed on a differet host CPU in
> +	 * previous VMRUNs.
>  	 */
> -	if (sd->sev_vcpus[asid] == &svm->vcpu &&
> -	    svm->vcpu.arch.last_vmentry_cpu == cpu)
> +	if (svm->vcpu.arch.last_vmentry_cpu == cpu)
>  		return 0;
>  
> -	sd->sev_vcpus[asid] = &svm->vcpu;
>  	vmcb_set_flush_asid(svm->vmcb);
>  	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
>  	return 0;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 1156ca97fd798..e6e380411fbec 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1554,6 +1554,7 @@ static void svm_prepare_host_switch(struct kvm_vcpu *vcpu)
>  
>  static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
> +	unsigned int asid;
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu);
>  
> @@ -1568,6 +1569,18 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	}
>  	if (kvm_vcpu_apicv_active(vcpu))
>  		avic_vcpu_load(vcpu, cpu);
> +
> +	if (sev_guest(vcpu->kvm)) {
> +		/*
> +		 * Flush the TLB when a different vCPU using the same ASID is
> +		 * run on the same CPU.
> +		 */
> +		asid = sev_get_asid(vcpu->kvm);
> +		if (sd->sev_vcpus[asid] != vcpu) {
> +			sd->sev_vcpus[asid] = vcpu;
> +			kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> +		}
> +	}
>  }
>  
>  static void svm_vcpu_put(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 4ea6c61c3b048..ca38a233fa24c 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -768,6 +768,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
>  void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
>  void sev_es_prepare_switch_to_guest(struct vcpu_svm *svm, struct sev_es_save_area *hostsa);
>  void sev_es_unmap_ghcb(struct vcpu_svm *svm);
> +unsigned int sev_get_asid(struct kvm *kvm);
>  
>  #ifdef CONFIG_KVM_AMD_SEV
>  int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp);

Makes sense, but I might have missed something.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 08/24] KVM: SEV: Drop pre_sev_run()
  2025-03-26 19:36 ` [RFC PATCH 08/24] KVM: SEV: Drop pre_sev_run() Yosry Ahmed
@ 2025-04-03 20:04   ` Maxim Levitsky
  0 siblings, 0 replies; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:04 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> Now that the ASID to vCPU/VMCB tracking was moved out of pre_sev_run(),
> the only remaining pieces are:
> (a) Checking for valid VMSA.
> (b) Assigning svm->asid.
> (c) Flush the ASID if the VMCB is run on a different physical CPU.
> 
> The check in (c) is already being done in pre_svm_run(), and so is
> redundant. (a) and (b) are small enough and probably do not warrant a
> separate helper (and (b) will be going way soon), so open-code the
> function into pre_svm_run() and remove it.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/sev.c | 28 ----------------------------
>  arch/x86/kvm/svm/svm.c | 16 ++++++++++++++--
>  arch/x86/kvm/svm/svm.h |  1 -
>  3 files changed, 14 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 3ef0dfdbb34d2..1742f51d4c194 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3451,34 +3451,6 @@ void sev_es_unmap_ghcb(struct vcpu_svm *svm)
>  	svm->sev_es.ghcb = NULL;
>  }
>  
> -int pre_sev_run(struct vcpu_svm *svm, int cpu)
> -{
> -	struct kvm *kvm = svm->vcpu.kvm;
> -	unsigned int asid = sev_get_asid(kvm);
> -
> -	/*
> -	 * Reject KVM_RUN if userspace attempts to run the vCPU with an invalid
> -	 * VMSA, e.g. if userspace forces the vCPU to be RUNNABLE after an SNP
> -	 * AP Destroy event.
> -	 */
> -	if (sev_es_guest(kvm) && !VALID_PAGE(svm->vmcb->control.vmsa_pa))
> -		return -EINVAL;
> -
> -	/* Assign the asid allocated with this SEV guest */
> -	svm->asid = asid;
> -
> -	/*
> -	 * Flush guest TLB if the VMCB was executed on a differet host CPU in
> -	 * previous VMRUNs.
> -	 */
> -	if (svm->vcpu.arch.last_vmentry_cpu == cpu)
> -		return 0;
> -
> -	vmcb_set_flush_asid(svm->vmcb);
> -	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
> -	return 0;
> -}
> -
>  #define GHCB_SCRATCH_AREA_LIMIT		(16ULL * PAGE_SIZE)
>  static int setup_vmgexit_scratch(struct vcpu_svm *svm, bool sync, u64 len)
>  {
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index e6e380411fbec..ce67112732e8c 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3649,8 +3649,20 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
>  		svm->current_vmcb->cpu = vcpu->cpu;
>          }
>  
> -	if (sev_guest(vcpu->kvm))
> -		return pre_sev_run(svm, vcpu->cpu);
> +	if (sev_guest(vcpu->kvm)) {
> +		/* Assign the asid allocated with this SEV guest */
> +		svm->asid = sev_get_asid(vcpu->kvm);
> +
> +		/*
> +		 * Reject KVM_RUN if userspace attempts to run the vCPU with an invalid
> +		 * VMSA, e.g. if userspace forces the vCPU to be RUNNABLE after an SNP
> +		 * AP Destroy event.
> +		 */
> +		if (sev_es_guest(vcpu->kvm) &&
> +		    !VALID_PAGE(svm->vmcb->control.vmsa_pa))
> +			return -EINVAL;
> +		return 0;
> +	}
>  
>  	/* FIXME: handle wraparound of asid_generation */
>  	if (svm->current_vmcb->asid_generation != sd->asid_generation)
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index ca38a233fa24c..3ab2a424992c1 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -760,7 +760,6 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
>  
>  /* sev.c */
>  
> -int pre_sev_run(struct vcpu_svm *svm, int cpu);
>  void sev_init_vmcb(struct vcpu_svm *svm);
>  void sev_vcpu_after_set_cpuid(struct vcpu_svm *svm);
>  int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in);

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 09/24] KVM: SEV: Generalize tracking ASID->vCPU with xarrays
  2025-03-26 19:36 ` [RFC PATCH 09/24] KVM: SEV: Generalize tracking ASID->vCPU with xarrays Yosry Ahmed
@ 2025-04-03 20:05   ` Maxim Levitsky
  2025-04-22  9:50     ` Yosry Ahmed
  0 siblings, 1 reply; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:05 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> Following changes will track ASID to vCPU mappings for all ASIDs, not
> just SEV ASIDs. Using per-CPU arrays with the maximum possible number of
> ASIDs would be too expensive.

Maybe add a word or two to explain that currently # of SEV ASIDS is small
but # of all ASIDS is relatively large (like 16 bit number or so)?

>  Use xarrays to generalize tracking the
> mappings instead. The logic is also mostly moved outside the SEV code to
> allow future changes to reuse it for normal SVM VMs.
> 
> Storing into an xarray is more expensive than reading/writing to an
> array, but is only done on vCPU load and should be mostly uncontended.
> Also, the size of the xarray should be O(# of VMs), so it is not
> expected to be huge. In fact, the xarray will probably use less memory
> than the normal array even for SEV on machines that only run a few VMs.
> 
> When a new ASID is allocated, reserve an entry for it on all xarrays on
> all CPUs. This allows the memory allocations to happen in a more relaxed
> context (allowing reclaim and accounting), and failures to be handled at
> VM creation time. However, entries will be allocated even on CPUs that
> never run the VM.
> 
> The alternative is relying on on-demand GFP_ATOMIC allocations with
> xa_store() on vCPU load.  These allocations are more likely to fail and
> more difficult to handle since vCPU load cannot fail. Flushing the TLB
> if the xa_store() fails is probably sufficient handling, but
> preallocations are easier to reason about.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/sev.c | 25 ++++-----------------
>  arch/x86/kvm/svm/svm.c | 50 +++++++++++++++++++++++++++++++-----------
>  arch/x86/kvm/svm/svm.h |  7 +++---
>  3 files changed, 44 insertions(+), 38 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 1742f51d4c194..c11da3259c089 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -211,6 +211,9 @@ static int sev_asid_new(struct kvm_sev_info *sev)
>  		goto e_uncharge;
>  	}
>  
> +	if (!svm_register_asid(asid))
> +		goto e_uncharge;
> +
>  	__set_bit(asid, sev_asid_bitmap);
>  
>  	mutex_unlock(&sev_bitmap_lock);
> @@ -231,18 +234,10 @@ unsigned int sev_get_asid(struct kvm *kvm)
>  
>  static void sev_asid_free(struct kvm_sev_info *sev)
>  {
> -	struct svm_cpu_data *sd;
> -	int cpu;
> +	svm_unregister_asid(sev->asid);
>  
>  	mutex_lock(&sev_bitmap_lock);
> -
>  	__set_bit(sev->asid, sev_reclaim_asid_bitmap);
> -
> -	for_each_possible_cpu(cpu) {
> -		sd = per_cpu_ptr(&svm_data, cpu);
> -		sd->sev_vcpus[sev->asid] = NULL;
> -	}
> -
>  	mutex_unlock(&sev_bitmap_lock);
>  
>  	sev_misc_cg_uncharge(sev);
> @@ -3076,18 +3071,6 @@ void sev_hardware_unsetup(void)
>  	misc_cg_set_capacity(MISC_CG_RES_SEV_ES, 0);
>  }
>  
> -int sev_cpu_init(struct svm_cpu_data *sd)
> -{
> -	if (!sev_enabled)
> -		return 0;
> -
> -	sd->sev_vcpus = kcalloc(nr_asids, sizeof(void *), GFP_KERNEL);
> -	if (!sd->sev_vcpus)
> -		return -ENOMEM;
> -
> -	return 0;
> -}
> -
>  /*
>   * Pages used by hardware to hold guest encrypted state must be flushed before
>   * returning them to the system.
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ce67112732e8c..b740114a9d9bc 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -694,7 +694,7 @@ static void svm_cpu_uninit(int cpu)
>  	if (!sd->save_area)
>  		return;
>  
> -	kfree(sd->sev_vcpus);
> +	xa_destroy(&sd->asid_vcpu);
>  	__free_page(__sme_pa_to_page(sd->save_area_pa));
>  	sd->save_area_pa = 0;
>  	sd->save_area = NULL;
> @@ -711,18 +711,11 @@ static int svm_cpu_init(int cpu)
>  	if (!save_area_page)
>  		return ret;
>  
> -	ret = sev_cpu_init(sd);
> -	if (ret)
> -		goto free_save_area;
> +	xa_init(&sd->asid_vcpu);
>  
>  	sd->save_area = page_address(save_area_page);
>  	sd->save_area_pa = __sme_page_pa(save_area_page);
>  	return 0;
> -
> -free_save_area:
> -	__free_page(save_area_page);
> -	return ret;
> -
>  }
>  
>  static void set_dr_intercepts(struct vcpu_svm *svm)
> @@ -1557,6 +1550,7 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	unsigned int asid;
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu);
> +	struct kvm_vcpu *prev;
>  
>  	if (vcpu->scheduled_out && !kvm_pause_in_guest(vcpu->kvm))
>  		shrink_ple_window(vcpu);
> @@ -1573,13 +1567,13 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	if (sev_guest(vcpu->kvm)) {
>  		/*
>  		 * Flush the TLB when a different vCPU using the same ASID is
> -		 * run on the same CPU.
> +		 * run on the same CPU. xa_store() should always succeed because
> +		 * the entry is reserved when the ASID is allocated.
>  		 */
>  		asid = sev_get_asid(vcpu->kvm);
> -		if (sd->sev_vcpus[asid] != vcpu) {
> -			sd->sev_vcpus[asid] = vcpu;
> +		prev = xa_store(&sd->asid_vcpu, asid, vcpu, GFP_ATOMIC);
> +		if (prev != vcpu || WARN_ON_ONCE(xa_err(prev)))

Tiny nitpick: I would have prefered to have WARN_ON_ONCE(xa_err(prev) first in the above condition,
because in theory we shouldn't use a value before we know its not an error,
but in practice this doesn't really matter.


>  			kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> -		}
>  	}
>  }
>  
> @@ -5047,6 +5041,36 @@ static void svm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
>  	sev_vcpu_deliver_sipi_vector(vcpu, vector);
>  }
>  
> +void svm_unregister_asid(unsigned int asid)
> +{
> +	struct svm_cpu_data *sd;
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		sd = per_cpu_ptr(&svm_data, cpu);
> +		xa_erase(&sd->asid_vcpu, asid);
> +	}
> +}
> +
> +bool svm_register_asid(unsigned int asid)
> +{
> +	struct svm_cpu_data *sd;
> +	int cpu;
> +
> +	/*
> +	 * Preallocate entries on all CPUs for the ASID to avoid memory
> +	 * allocations in the vCPU load path.
> +	 */
> +	for_each_possible_cpu(cpu) {
> +		sd = per_cpu_ptr(&svm_data, cpu);
> +		if (xa_reserve(&sd->asid_vcpu, asid, GFP_KERNEL_ACCOUNT)) {
> +			svm_unregister_asid(asid);
> +			return false;
> +		}
> +	}
> +	return true;
> +}
> +
>  static void svm_vm_destroy(struct kvm *kvm)
>  {
>  	avic_vm_destroy(kvm);
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 3ab2a424992c1..4929b96d3d700 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -340,8 +340,7 @@ struct svm_cpu_data {
>  
>  	struct vmcb *current_vmcb;
>  
> -	/* index = sev_asid, value = vcpu pointer */
Maybe keep the above comment?

> -	struct kvm_vcpu **sev_vcpus;
> +	struct xarray asid_vcpu;
>  };
>  
>  DECLARE_PER_CPU(struct svm_cpu_data, svm_data);
> @@ -655,6 +654,8 @@ void set_msr_interception(struct kvm_vcpu *vcpu, u32 *msrpm, u32 msr,
>  void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool disable);
>  void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
>  				     int trig_mode, int vec);
> +bool svm_register_asid(unsigned int asid);
> +void svm_unregister_asid(unsigned int asid);
>  
>  /* nested.c */
>  
> @@ -793,7 +794,6 @@ void sev_vm_destroy(struct kvm *kvm);
>  void __init sev_set_cpu_caps(void);
>  void __init sev_hardware_setup(void);
>  void sev_hardware_unsetup(void);
> -int sev_cpu_init(struct svm_cpu_data *sd);
>  int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
>  extern unsigned int max_sev_asid;
>  void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
> @@ -817,7 +817,6 @@ static inline void sev_vm_destroy(struct kvm *kvm) {}
>  static inline void __init sev_set_cpu_caps(void) {}
>  static inline void __init sev_hardware_setup(void) {}
>  static inline void sev_hardware_unsetup(void) {}
> -static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; }
>  static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; }
>  #define max_sev_asid 0
>  static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}


Overall looks good to me.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 10/24] KVM: SVM: Use a single ASID per VM
  2025-03-26 19:36 ` [RFC PATCH 10/24] KVM: SVM: Use a single ASID per VM Yosry Ahmed
@ 2025-04-03 20:05   ` Maxim Levitsky
  2025-04-22  9:51     ` Yosry Ahmed
  0 siblings, 1 reply; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:05 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> The ASID generation and dynamic ASID allocation logic is now only used
> by initialization the generation to 0 to trigger a new ASID allocation
> per-vCPU on the first VMRUN, so the ASID is more-or-less static
> per-vCPU.
> 
> Moreover, having a unique ASID per-vCPU is not required. ASIDs are local
> to each physical CPU, and the ASID is flushed when a vCPU is migrated to
> a new physical CPU anyway. SEV VMs have been using a single ASID per VM
> already for other reasons.
> 
> Use a static ASID per VM and drop the dynamic ASID allocation logic. The
> ASID is allocated during vCPU reset (SEV allocates its own ASID), and
> the ASID is always flushed on first use in case it was used by another
> VM previously.
> 
> The existing check for whether the ASID in the VMCB matches the per-vCPU
> ASID is turned into a WARN because it is not expected behavior anymore,
> and is moved from svm_vcpu_run() to pre_svm_run() such that it's not
> checked for SEV guests. The check does not apply as-is for SEV, and a
> separate check is added in pre_sev_run() instead. These checks will be
> consolidated (among other code) in a followup change.
> 
> As ASIDs cannot be disabled (like how VPIDs can be disabled on Intel),
> handle ASID allocation failure by falling back to a single shared ASID
> allocated during hardware setup. This ASID is flushed on every VMRUN if
> it is in use to avoid sharing TLB entries between different guests. This
> should be unlikely with modern AMD CPUs as they have over 32k ASIDs.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/nested.c |   3 +-
>  arch/x86/kvm/svm/svm.c    | 129 ++++++++++++++++++++++----------------
>  arch/x86/kvm/svm/svm.h    |  10 +--
>  3 files changed, 80 insertions(+), 62 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 11b02a0340d9e..81184b2fb27fd 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -677,8 +677,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>  	vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
>  	vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
>  	vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa;
> -
> -	/* Done at vmrun: asid.  */
> +	vmcb02->control.asid = svm_asid(vcpu->kvm);
>  
>  	/* Also overwritten later if necessary.  */
>  	vmcb_clr_flush_asid(vmcb02);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index b740114a9d9bc..f028d006f69dc 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -249,6 +249,9 @@ static unsigned long iopm_base;
>  
>  DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
>  
> +static struct kvm_tlb_tags svm_asids;
> +static unsigned int fallback_asid;
> +
>  /*
>   * Only MSR_TSC_AUX is switched via the user return hook.  EFER is switched via
>   * the VMCB, and the SYSCALL/SYSENTER MSRs are handled by VMLOAD/VMSAVE.
> @@ -621,10 +624,6 @@ static int svm_enable_virtualization_cpu(void)
>  		return -EBUSY;
>  
>  	sd = per_cpu_ptr(&svm_data, me);
> -	sd->asid_generation = 1;
> -	sd->max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1;
> -	sd->next_asid = sd->max_asid + 1;
> -	sd->min_asid = max_sev_asid + 1;
>  
>  	wrmsrl(MSR_EFER, efer | EFER_SVME);
>  
> @@ -1119,6 +1118,7 @@ static void svm_hardware_unsetup(void)
>  
>  	__free_pages(__sme_pa_to_page(iopm_base), get_order(IOPM_SIZE));
>  	iopm_base = 0;
> +	kvm_tlb_tags_destroy(&svm_asids);
>  }
>  
>  static void init_seg(struct vmcb_seg *seg)
> @@ -1225,6 +1225,20 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	}
>  }
>  
> +unsigned int svm_asid(struct kvm *kvm)
> +{
> +	return to_kvm_svm(kvm)->asid;
> +}
> +
> +static unsigned int svm_get_current_asid(struct vcpu_svm *svm)
> +{
> +	struct kvm *kvm = svm->vcpu.kvm;
> +
> +	if (sev_guest(kvm))
> +		return sev_get_asid(kvm);
> +	return svm_asid(kvm);
> +}
> +
>  static void init_vmcb(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -1300,6 +1314,8 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>  	control->iopm_base_pa = iopm_base;
>  	control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
>  	control->int_ctl = V_INTR_MASKING_MASK;
> +	control->asid = svm_asid(vcpu->kvm);
> +	vmcb_set_flush_asid(svm->vmcb);
>  
>  	init_seg(&save->es);
>  	init_seg(&save->ss);
> @@ -1332,8 +1348,6 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>  		save->g_pat = vcpu->arch.pat;
>  		save->cr3 = 0;
>  	}
> -	svm->current_vmcb->asid_generation = 0;
> -	svm->asid = 0;
>  
>  	svm->nested.vmcb12_gpa = INVALID_GPA;
>  	svm->nested.last_vmcb12_gpa = INVALID_GPA;
> @@ -1547,9 +1561,9 @@ static void svm_prepare_host_switch(struct kvm_vcpu *vcpu)
>  
>  static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
> -	unsigned int asid;
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu);
> +	unsigned int asid = svm_get_current_asid(svm);
>  	struct kvm_vcpu *prev;
>  
>  	if (vcpu->scheduled_out && !kvm_pause_in_guest(vcpu->kvm))
> @@ -1564,17 +1578,14 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	if (kvm_vcpu_apicv_active(vcpu))
>  		avic_vcpu_load(vcpu, cpu);
>  
> -	if (sev_guest(vcpu->kvm)) {
> -		/*
> -		 * Flush the TLB when a different vCPU using the same ASID is
> -		 * run on the same CPU. xa_store() should always succeed because
> -		 * the entry is reserved when the ASID is allocated.
> -		 */
> -		asid = sev_get_asid(vcpu->kvm);
> -		prev = xa_store(&sd->asid_vcpu, asid, vcpu, GFP_ATOMIC);
> -		if (prev != vcpu || WARN_ON_ONCE(xa_err(prev)))
> -			kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> -	}
> +	/*
> +	 * Flush the TLB when a different vCPU using the same ASID is
> +	 * run on the same CPU. xa_store() should always succeed because
> +	 * the entry is reserved when the ASID is allocated.
> +	 */
> +	prev = xa_store(&sd->asid_vcpu, asid, vcpu, GFP_ATOMIC);
> +	if (prev != vcpu || WARN_ON_ONCE(xa_err(prev)))
> +		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
>  }
>  
>  static void svm_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -1989,19 +2000,6 @@ static void svm_update_exception_bitmap(struct kvm_vcpu *vcpu)
>  	}
>  }
>  
> -static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd)
> -{
> -	if (sd->next_asid > sd->max_asid) {
> -		++sd->asid_generation;
> -		sd->next_asid = sd->min_asid;
> -		svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ALL_ASID;
> -		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
> -	}
> -
> -	svm->current_vmcb->asid_generation = sd->asid_generation;
> -	svm->asid = sd->next_asid++;
> -}
> -
>  static void svm_set_dr6(struct kvm_vcpu *vcpu, unsigned long value)
>  {
>  	struct vmcb *vmcb = to_svm(vcpu)->vmcb;
> @@ -3629,8 +3627,16 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
>  
>  static int pre_svm_run(struct kvm_vcpu *vcpu)
>  {
> -	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, vcpu->cpu);
>  	struct vcpu_svm *svm = to_svm(vcpu);
> +	unsigned int asid = svm_get_current_asid(svm);
> +
> +	/*
> +	 * Reject KVM_RUN if userspace attempts to run the vCPU with an invalid
> +	 * VMSA, e.g. if userspace forces the vCPU to be RUNNABLE after an SNP
> +	 * AP Destroy event.
> +	 */
> +	if (sev_es_guest(vcpu->kvm) && !VALID_PAGE(svm->vmcb->control.vmsa_pa))
> +		return -EINVAL;
>  
>  	/*
>  	 * If the previous VMRUN of the VMCB occurred on a different physical
> @@ -3643,25 +3649,20 @@ static int pre_svm_run(struct kvm_vcpu *vcpu)
>  		svm->current_vmcb->cpu = vcpu->cpu;
>          }
>  
> -	if (sev_guest(vcpu->kvm)) {
> -		/* Assign the asid allocated with this SEV guest */
> -		svm->asid = sev_get_asid(vcpu->kvm);
> +	/*
> +	 * If we run out of space and ASID allocation fails, we fallback to a
> +	 * shared fallback ASID. For that ASID, we need to flush the TLB on
> +	 * every VMRUN to avoid sharing TLB entries between different guests.
> +	 */
> +	if (unlikely(asid == fallback_asid))
> +		vmcb_set_flush_asid(svm->vmcb);
>  
> -		/*
> -		 * Reject KVM_RUN if userspace attempts to run the vCPU with an invalid
> -		 * VMSA, e.g. if userspace forces the vCPU to be RUNNABLE after an SNP
> -		 * AP Destroy event.
> -		 */
> -		if (sev_es_guest(vcpu->kvm) &&
> -		    !VALID_PAGE(svm->vmcb->control.vmsa_pa))
> -			return -EINVAL;
> -		return 0;
> +	if (WARN_ON_ONCE(svm->vmcb->control.asid != asid)) {
> +		vmcb_set_flush_asid(svm->vmcb);
> +		svm->vmcb->control.asid = asid;
> +		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
>  	}
>  
> -	/* FIXME: handle wraparound of asid_generation */
> -	if (svm->current_vmcb->asid_generation != sd->asid_generation)
> -		new_asid(svm, sd);
> -
>  	return 0;
>  }
>  
> @@ -4062,7 +4063,7 @@ static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
>  
> -	invlpga(gva, svm->vmcb->control.asid);
> +	invlpga(gva, svm_get_current_asid(svm));
>  }
>  
>  static inline void sync_cr8_to_lapic(struct kvm_vcpu *vcpu)
> @@ -4308,10 +4309,6 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
>  
>  	sync_lapic_to_cr8(vcpu);
>  
> -	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
> -		svm->vmcb->control.asid = svm->asid;
> -		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
> -	}
>  	svm->vmcb->save.cr2 = vcpu->arch.cr2;
>  
>  	svm_hv_update_vp_id(svm->vmcb, vcpu);
> @@ -5073,13 +5070,18 @@ bool svm_register_asid(unsigned int asid)
>  
>  static void svm_vm_destroy(struct kvm *kvm)
>  {
> +	struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
> +
>  	avic_vm_destroy(kvm);
>  	sev_vm_destroy(kvm);
> +	kvm_tlb_tags_free(&svm_asids, kvm_svm->asid);
>  }
>  
>  static int svm_vm_init(struct kvm *kvm)
>  {
> +	struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
>  	int type = kvm->arch.vm_type;
> +	unsigned int asid;
>  
>  	if (type != KVM_X86_DEFAULT_VM &&
>  	    type != KVM_X86_SW_PROTECTED_VM) {
> @@ -5100,6 +5102,13 @@ static int svm_vm_init(struct kvm *kvm)
>  			return ret;
>  	}
>  
> +	asid = kvm_tlb_tags_alloc(&svm_asids);
> +	if (asid && !svm_register_asid(asid)) {
> +		kvm_tlb_tags_free(&svm_asids, asid);
> +		asid = 0;
> +	}
> +	kvm_svm->asid = asid ?: fallback_asid;
> +
>  	return 0;
>  }
>  
> @@ -5381,6 +5390,7 @@ static __init int svm_hardware_setup(void)
>  	void *iopm_va;
>  	int r;
>  	unsigned int order = get_order(IOPM_SIZE);
> +	unsigned int min_asid, max_asid;
>  
>  	/*
>  	 * NX is required for shadow paging and for NPT if the NX huge pages
> @@ -5473,6 +5483,13 @@ static __init int svm_hardware_setup(void)
>  	 */
>  	sev_hardware_setup();
>  
> +	/* Consumes max_sev_asid initialized by sev_hardware_setup() */
> +	min_asid = max_sev_asid + 1;
> +	max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1;
> +	r = kvm_tlb_tags_init(&svm_asids, min_asid, max_asid);
> +	if (r)
> +		goto err;
> +
>  	svm_hv_hardware_setup();
>  
>  	for_each_possible_cpu(cpu) {
> @@ -5481,6 +5498,12 @@ static __init int svm_hardware_setup(void)
>  			goto err;
>  	}
>  
> +	fallback_asid = kvm_tlb_tags_alloc(&svm_asids);
> +	WARN_ON_ONCE(!fallback_asid);

Nitpick: This really can't happen unless there is some very bad bug lurking somewhere.
And if this happens, nothing will work, since likely that regular ASID allocation
will fail too.

So why not to fail svm_hardware_setup instead?


> +
> +	/* Needs to be after svm_cpu_init() initializes the per-CPU xarrays */
> +	svm_register_asid(fallback_asid);
> +
>  	enable_apicv = avic = avic && avic_hardware_setup();
>  
>  	if (!enable_apicv) {
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 4929b96d3d700..436b7e83141b9 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -117,6 +117,8 @@ struct kvm_sev_info {
>  struct kvm_svm {
>  	struct kvm kvm;
>  
> +	unsigned int asid;
> +
>  	/* Struct members for AVIC */
>  	u32 avic_vm_id;
>  	struct page *avic_logical_id_table_page;
> @@ -132,7 +134,6 @@ struct kvm_vmcb_info {
>  	struct vmcb *ptr;
>  	unsigned long pa;
>  	int cpu;
> -	uint64_t asid_generation;
>  };
>  
>  struct vmcb_save_area_cached {
> @@ -247,7 +248,6 @@ struct vcpu_svm {
>  	struct vmcb *vmcb;
>  	struct kvm_vmcb_info vmcb01;
>  	struct kvm_vmcb_info *current_vmcb;
> -	u32 asid;
>  	u32 sysenter_esp_hi;
>  	u32 sysenter_eip_hi;
>  	uint64_t tsc_aux;
> @@ -330,11 +330,6 @@ struct vcpu_svm {
>  };
>  
>  struct svm_cpu_data {
> -	u64 asid_generation;
> -	u32 max_asid;
> -	u32 next_asid;
> -	u32 min_asid;
> -
>  	struct vmcb *save_area;
>  	unsigned long save_area_pa;
>  
> @@ -656,6 +651,7 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
>  				     int trig_mode, int vec);
>  bool svm_register_asid(unsigned int asid);
>  void svm_unregister_asid(unsigned int asid);
> +unsigned int svm_asid(struct kvm *kvm);
>  
>  /* nested.c */
>  

Overall looks good to me.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 11/24] KVM: nSVM: Use a separate ASID for nested guests
  2025-03-26 19:36 ` [RFC PATCH 11/24] KVM: nSVM: Use a separate ASID for nested guests Yosry Ahmed
@ 2025-04-03 20:09   ` Maxim Levitsky
  2025-04-22 10:08     ` Yosry Ahmed
  0 siblings, 1 reply; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:09 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> The per-VM ASID is currently shared by both L1 and L2 guests. That ASID
> is currently flushed on every transition between L1 and L2.
> 
> Allocate and track a separate ASID per-VM for nested guests. This is in
> preparation for doing fine-grained TLB flushes on nested transitions
> instead of unconditional full flushes.
> 
> Nested ASIDs are still not fully maintained (e.g. a remote flush will
> only flush the current ASID), so keep the TLB flush on every transition
> until this is sorted out in following changes.
> 
> Add a helper to get the ASID associated with a specific VMCB and use it
> instead of directly reading the VM's ASID. This transparently uses L2's
> ASID when an L2 guest is being run.
> 
> L1's ASID is flushed on KVM_REQ_TLB_FLUSH_GUEST if it is the active
> context, so remove the TODO in nested_svm_transition_tlb_flush() about
> it.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/nested.c |  8 ++++++--
>  arch/x86/kvm/svm/svm.c    | 13 +++++++++++--
>  arch/x86/kvm/svm/svm.h    |  3 ++-
>  3 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 81184b2fb27fd..75223869aa8c6 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -495,7 +495,6 @@ static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
>  	 *  - Honor L1's request to flush an ASID on nested VMRUN
>  	 *  - Sync nested NPT MMU on VMRUN that flushes L2's ASID[*]
>  	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
> -	 *  - Flush L1's ASID on KVM_REQ_TLB_FLUSH_GUEST
>  	 *
>  	 * [*] Unlike nested EPT, SVM's ASID management can invalidate nested
>  	 *     NPT guest-physical mappings on VMRUN.
> @@ -677,7 +676,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
>  	vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
>  	vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
>  	vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa;
> -	vmcb02->control.asid = svm_asid(vcpu->kvm);
> +	vmcb02->control.asid = svm_nested_asid(vcpu->kvm);
>  
>  	/* Also overwritten later if necessary.  */
>  	vmcb_clr_flush_asid(vmcb02);
> @@ -1179,6 +1178,7 @@ static void nested_svm_triple_fault(struct kvm_vcpu *vcpu)
>  
>  int svm_allocate_nested(struct vcpu_svm *svm)
>  {
> +	struct kvm_svm *kvm_svm = to_kvm_svm(svm->vcpu.kvm);
>  	struct page *vmcb02_page;
>  
>  	if (svm->nested.initialized)
> @@ -1196,6 +1196,10 @@ int svm_allocate_nested(struct vcpu_svm *svm)
>  	svm_vcpu_init_msrpm(&svm->vcpu, svm->nested.msrpm);
>  
>  	svm->nested.initialized = true;
> +
> +	if (!kvm_svm->nested_asid)
> +		kvm_svm->nested_asid = kvm_svm->asid;

Nitpick: maybe put nested_asid into .nested struct as well?
I don't have a strong option on this, feel free to leave it where it is now.


> +
>  	return 0;
>  
>  err_free_vmcb02:
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index f028d006f69dc..e664d8428c792 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1225,17 +1225,26 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
>  	}
>  }
>  
> -unsigned int svm_asid(struct kvm *kvm)
> +unsigned int svm_nested_asid(struct kvm *kvm)
> +{
> +	return to_kvm_svm(kvm)->nested_asid;
> +}

It might also make sense to add WARN_ON_ONCE(!svm->nested.initialized) here, just in case.

> +
> +static unsigned int svm_asid(struct kvm *kvm)
>  {
>  	return to_kvm_svm(kvm)->asid;
>  }
>  
>  static unsigned int svm_get_current_asid(struct vcpu_svm *svm)
>  {
> -	struct kvm *kvm = svm->vcpu.kvm;
> +	struct kvm_vcpu *vcpu = &svm->vcpu;
> +	struct kvm *kvm = vcpu->kvm;
>  
>  	if (sev_guest(kvm))
>  		return sev_get_asid(kvm);
> +	if (is_guest_mode(vcpu))
> +		return svm_nested_asid(kvm);
> +	WARN_ON_ONCE(svm->current_vmcb != &svm->vmcb01);
>  	return svm_asid(kvm);
>  }
>  
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 436b7e83141b9..e67e3a64e92f7 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -118,6 +118,7 @@ struct kvm_svm {
>  	struct kvm kvm;
>  
>  	unsigned int asid;
> +	unsigned int nested_asid;
>  
>  	/* Struct members for AVIC */
>  	u32 avic_vm_id;
> @@ -651,7 +652,7 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
>  				     int trig_mode, int vec);
>  bool svm_register_asid(unsigned int asid);
>  void svm_unregister_asid(unsigned int asid);
> -unsigned int svm_asid(struct kvm *kvm);
> +unsigned int svm_nested_asid(struct kvm *kvm);
>  
>  /* nested.c */
>  


Overall looks good,

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 12/24] KVM: x86: hyper-v: Pass is_guest_mode to kvm_hv_vcpu_purge_flush_tlb()
  2025-03-26 19:36 ` [RFC PATCH 12/24] KVM: x86: hyper-v: Pass is_guest_mode to kvm_hv_vcpu_purge_flush_tlb() Yosry Ahmed
@ 2025-04-03 20:09   ` Maxim Levitsky
  2025-06-23 19:22     ` Sean Christopherson
  0 siblings, 1 reply; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:09 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> Instead of calling is_guest_mode() inside kvm_hv_vcpu_purge_flush_tlb()
> pass the value from the caller. Future changes will pass different
> values than is_guest_mode(vcpu).
> 
> No functional change intended.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/hyperv.h  | 8 +++++---
>  arch/x86/kvm/svm/svm.c | 2 +-
>  arch/x86/kvm/x86.c     | 2 +-
>  3 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> index 913bfc96959cb..be715deaeb003 100644
> --- a/arch/x86/kvm/hyperv.h
> +++ b/arch/x86/kvm/hyperv.h
> @@ -203,14 +203,15 @@ static inline struct kvm_vcpu_hv_tlb_flush_fifo *kvm_hv_get_tlb_flush_fifo(struc
>  	return &hv_vcpu->tlb_flush_fifo[i];
>  }
>  
> -static inline void kvm_hv_vcpu_purge_flush_tlb(struct kvm_vcpu *vcpu)
> +static inline void kvm_hv_vcpu_purge_flush_tlb(struct kvm_vcpu *vcpu,
> +					       bool is_guest_mode)
>  {
>  	struct kvm_vcpu_hv_tlb_flush_fifo *tlb_flush_fifo;
>  
>  	if (!to_hv_vcpu(vcpu) || !kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu))
>  		return;
>  
> -	tlb_flush_fifo = kvm_hv_get_tlb_flush_fifo(vcpu, is_guest_mode(vcpu));
> +	tlb_flush_fifo = kvm_hv_get_tlb_flush_fifo(vcpu, is_guest_mode);
>  
>  	kfifo_reset_out(&tlb_flush_fifo->entries);
>  }
> @@ -285,7 +286,8 @@ static inline int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
>  {
>  	return HV_STATUS_ACCESS_DENIED;
>  }
> -static inline void kvm_hv_vcpu_purge_flush_tlb(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_hv_vcpu_purge_flush_tlb(struct kvm_vcpu *vcpu,
> +					       bool is_guest_mode) {}
>  static inline bool kvm_hv_synic_has_vector(struct kvm_vcpu *vcpu, int vector)
>  {
>  	return false;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index e664d8428c792..865c5ce4fa473 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4025,7 +4025,7 @@ static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
>  	 * A TLB flush for the current ASID flushes both "host" and "guest" TLB
>  	 * entries, and thus is a superset of Hyper-V's fine grained flushing.
>  	 */
> -	kvm_hv_vcpu_purge_flush_tlb(vcpu);
> +	kvm_hv_vcpu_purge_flush_tlb(vcpu, is_guest_mode(vcpu));
>  
>  	/*
>  	 * Flush only the current ASID even if the TLB flush was invoked via
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 182f18ebc62f3..469a8e5526902 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3615,7 +3615,7 @@ static void kvm_vcpu_flush_tlb_guest(struct kvm_vcpu *vcpu)
>  	 * Flushing all "guest" TLB is always a superset of Hyper-V's fine
>  	 * grained flushing.
>  	 */
> -	kvm_hv_vcpu_purge_flush_tlb(vcpu);
> +	kvm_hv_vcpu_purge_flush_tlb(vcpu, is_guest_mode(vcpu));
>  }
>  
>  

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 13/24] KVM: nSVM: Parameterize svm_flush_tlb_asid() by is_guest_mode
  2025-03-26 19:36 ` [RFC PATCH 13/24] KVM: nSVM: Parameterize svm_flush_tlb_asid() by is_guest_mode Yosry Ahmed
@ 2025-04-03 20:10   ` Maxim Levitsky
  2025-04-22 10:04     ` Yosry Ahmed
  0 siblings, 1 reply; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:10 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> svm_flush_tlb_asid() currently operates on the current VMCB. In
> preparation for properly tracking TLB flushes for L1 and L2 ASIDs,
> refactor it to take is_guest_mode and find the proper VMCB. All existing
> callers pass is_guest_mode(vcpu) to maintain existing behavior for now.
> 
> Move the comment about only flushing the current ASID to
> svm_flush_tlb_all(), where it probably should have been anyway, because
> svm_flush_tlb_asid() now flushes a given ASID, not the current ASID.
> 
> Create a svm_flush_tlb_guest() wrapper to use as the flush_tlb_guest()
> callback.
> 
> No functional change intended.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/svm.c | 39 +++++++++++++++++++++++++--------------
>  1 file changed, 25 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 865c5ce4fa473..fb6b9f88a1504 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4016,25 +4016,24 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
>  	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
>  }
>  
> -static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
> +static struct vmcb *svm_get_vmcb(struct vcpu_svm *svm, bool is_guest_mode)
> +{
> +	return is_guest_mode ? svm->nested.vmcb02.ptr : svm->vmcb01.ptr;
> +}

Not sure 100% about this helper, it name might be a bit confusing because
we already have a current vmcb. Maybe add a comment above stating this
this is to get vmcb which might not be currently active?

> +
> +static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu, bool is_guest_mode)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct vmcb *vmcb = svm_get_vmcb(svm, is_guest_mode);
>  
>  	/*
>  	 * Unlike VMX, SVM doesn't provide a way to flush only NPT TLB entries.
>  	 * A TLB flush for the current ASID flushes both "host" and "guest" TLB
>  	 * entries, and thus is a superset of Hyper-V's fine grained flushing.
>  	 */
> -	kvm_hv_vcpu_purge_flush_tlb(vcpu, is_guest_mode(vcpu));
> -
> -	/*
> -	 * Flush only the current ASID even if the TLB flush was invoked via
> -	 * kvm_flush_remote_tlbs().  Although flushing remote TLBs requires all
> -	 * ASIDs to be flushed, KVM uses a single ASID for L1 and L2, and
> -	 * unconditionally does a TLB flush on both nested VM-Enter and nested
> -	 * VM-Exit (via kvm_mmu_reset_context()).
> -	 */
> -	vmcb_set_flush_asid(svm->vmcb);
> +	kvm_hv_vcpu_purge_flush_tlb(vcpu, is_guest_mode);
> +	if (vmcb)
> +		vmcb_set_flush_asid(vmcb);
>  }
>  
>  static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> @@ -4050,7 +4049,7 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
>  	if (svm_hv_is_enlightened_tlb_enabled(vcpu) && VALID_PAGE(root_tdp))
>  		hyperv_flush_guest_mapping(root_tdp);
>  
> -	svm_flush_tlb_asid(vcpu);
> +	svm_flush_tlb_asid(vcpu, is_guest_mode(vcpu));
>  }
>  
>  static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
> @@ -4065,7 +4064,14 @@ static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
>  	if (WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
>  		hv_flush_remote_tlbs(vcpu->kvm);
>  
> -	svm_flush_tlb_asid(vcpu);
> +	/*
> +	 * Flush only the current ASID even if the TLB flush was invoked via
> +	 * kvm_flush_remote_tlbs().  Although flushing remote TLBs requires all
> +	 * ASIDs to be flushed, KVM uses a single ASID for L1 and L2, and
> +	 * unconditionally does a TLB flush on both nested VM-Enter and nested
> +	 * VM-Exit (via kvm_mmu_reset_context()).
> +	 */
> +	svm_flush_tlb_asid(vcpu, is_guest_mode(vcpu));
>  }
>  
>  static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
> @@ -4075,6 +4081,11 @@ static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
>  	invlpga(gva, svm_get_current_asid(svm));
>  }
>  
> +static void svm_flush_tlb_guest(struct kvm_vcpu *vcpu)
> +{
> +	svm_flush_tlb_asid(vcpu, is_guest_mode(vcpu));
> +}
> +
>  static inline void sync_cr8_to_lapic(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -5187,7 +5198,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.flush_tlb_all = svm_flush_tlb_all,
>  	.flush_tlb_current = svm_flush_tlb_current,
>  	.flush_tlb_gva = svm_flush_tlb_gva,
> -	.flush_tlb_guest = svm_flush_tlb_asid,
> +	.flush_tlb_guest = svm_flush_tlb_guest,
>  
>  	.vcpu_pre_run = svm_vcpu_pre_run,
>  	.vcpu_run = svm_vcpu_run,


Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>


Best regards,
	Maxim Levitsky






^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 15/24] KVM: x86/mmu: rename __kvm_mmu_invalidate_addr()
  2025-03-26 19:36 ` [RFC PATCH 15/24] KVM: x86/mmu: rename __kvm_mmu_invalidate_addr() Yosry Ahmed
@ 2025-04-03 20:10   ` Maxim Levitsky
  0 siblings, 0 replies; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:10 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> In preparation for creating another helper for
> kvm_mmu_invalidate_addr(), rename __kvm_mmu_invalidate_addr() to
> kvm_mmu_invalidate_addr_in_root().
> 
> No functional change intended.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/mmu/mmu.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 63bb77ee1bb16..4a72ada0a7585 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6317,8 +6317,9 @@ void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg)
>  }
>  EXPORT_SYMBOL_GPL(kvm_mmu_print_sptes);
>  
> -static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> -				      u64 addr, hpa_t root_hpa)
> +static void kvm_mmu_invalidate_addr_in_root(struct kvm_vcpu *vcpu,
> +					    struct kvm_mmu *mmu,
> +					    u64 addr, hpa_t root_hpa)
>  {
>  	struct kvm_shadow_walk_iterator iterator;
>  
> @@ -6374,11 +6375,11 @@ void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  		return;
>  
>  	if (roots & KVM_MMU_ROOT_CURRENT)
> -		__kvm_mmu_invalidate_addr(vcpu, mmu, addr, mmu->root.hpa);
> +		kvm_mmu_invalidate_addr_in_root(vcpu, mmu, addr, mmu->root.hpa);
>  
>  	for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) {
>  		if (roots & KVM_MMU_ROOT_PREVIOUS(i))
> -			__kvm_mmu_invalidate_addr(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
> +			kvm_mmu_invalidate_addr_in_root(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
>  	}
>  }
>  EXPORT_SYMBOL_GPL(kvm_mmu_invalidate_addr);

Reviewed-by: Maxim Levitsky<mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 16/24] KVM: x86/mmu: Allow skipping the gva flush in kvm_mmu_invalidate_addr()
  2025-03-26 19:36 ` [RFC PATCH 16/24] KVM: x86/mmu: Allow skipping the gva flush in kvm_mmu_invalidate_addr() Yosry Ahmed
@ 2025-04-03 20:10   ` Maxim Levitsky
  0 siblings, 0 replies; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:10 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> Refactor a helper out of kvm_mmu_invalidate_addr() that allows skipping
> the gva flush. This will be used when an invalidation is needed but the
> GVA TLB translations that require invalidation are not of the current
> context (e.g. when emulating INVLPGA for L1 to flush L2's translations).
> 
> No functional change intended.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/mmu/mmu.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 4a72ada0a7585..e2b1994f12753 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6355,15 +6355,15 @@ static void kvm_mmu_invalidate_addr_in_root(struct kvm_vcpu *vcpu,
>  	write_unlock(&vcpu->kvm->mmu_lock);
>  }
>  
> -void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> -			     u64 addr, unsigned long roots)
> +static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> +				      u64 addr, unsigned long roots, bool gva_flush)
>  {
>  	int i;
>  
>  	WARN_ON_ONCE(roots & ~KVM_MMU_ROOTS_ALL);
>  
>  	/* It's actually a GPA for vcpu->arch.guest_mmu.  */
> -	if (mmu != &vcpu->arch.guest_mmu) {
> +	if (gva_flush && mmu != &vcpu->arch.guest_mmu) {
>  		/* INVLPG on a non-canonical address is a NOP according to the SDM.  */
>  		if (is_noncanonical_invlpg_address(addr, vcpu))
>  			return;
> @@ -6382,6 +6382,12 @@ void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  			kvm_mmu_invalidate_addr_in_root(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
>  	}
>  }
> +
> +void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> +			     u64 addr, unsigned long roots)
> +{
> +	__kvm_mmu_invalidate_addr(vcpu, mmu, addr, roots, true);
> +}
>  EXPORT_SYMBOL_GPL(kvm_mmu_invalidate_addr);
>  
>  void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 17/24] KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH
  2025-03-26 19:36 ` [RFC PATCH 17/24] KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH Yosry Ahmed
@ 2025-04-03 20:10   ` Maxim Levitsky
  0 siblings, 0 replies; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:10 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> KVM_REQ_TLB_FLUSH is used to flush all TLB entries for all contexts
> (e.g. in kvm_flush_remote_tlbs()). Flush both L1 and L2 ASIDs in
> svm_flush_tlb_all() to handle it appropriately.
> 
> This is currently not required as nested transitions do unconditional
> TLB flushes, but this is a step toward eliminating that.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/nested.c |  1 -
>  arch/x86/kvm/svm/svm.c    | 10 ++--------
>  2 files changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index c336ab63c6da3..56a4ff480bb3d 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -491,7 +491,6 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
>  	 * TODO: optimize unconditional TLB flush/MMU sync.  A partial list of
>  	 * things to fix before this can be conditional:
>  	 *
> -	 *  - Flush TLBs for both L1 and L2 remote TLB flush
>  	 *  - Honor L1's request to flush an ASID on nested VMRUN
>  	 *  - Sync nested NPT MMU on VMRUN that flushes L2's ASID[*]
>  	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index fb6b9f88a1504..4cad1085936bb 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4064,14 +4064,8 @@ static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
>  	if (WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
>  		hv_flush_remote_tlbs(vcpu->kvm);
>  
> -	/*
> -	 * Flush only the current ASID even if the TLB flush was invoked via
> -	 * kvm_flush_remote_tlbs().  Although flushing remote TLBs requires all
> -	 * ASIDs to be flushed, KVM uses a single ASID for L1 and L2, and
> -	 * unconditionally does a TLB flush on both nested VM-Enter and nested
> -	 * VM-Exit (via kvm_mmu_reset_context()).
> -	 */
> -	svm_flush_tlb_asid(vcpu, is_guest_mode(vcpu));
> +	svm_flush_tlb_asid(vcpu, false);
> +	svm_flush_tlb_asid(vcpu, true);
>  }
>  
>  static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 22/24] KVM: nSVM: Handle INVLPGA interception correctly
  2025-03-26 19:44   ` [RFC PATCH 22/24] KVM: nSVM: Handle INVLPGA interception correctly Yosry Ahmed
@ 2025-04-03 20:10     ` Maxim Levitsky
  2025-06-24  1:08     ` Sean Christopherson
  1 sibling, 0 replies; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:10 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:44 +0000, Yosry Ahmed wrote:
> Currently, INVPLGA interception handles it like INVLPG, which flushes
> L1's TLB translations for the address. It was implemented in this way
> because L1 and L2 shared an ASID. Now, L1 and L2 have separate ASIDs. It
> is still harmless to flush L1's translations, but it's only correct
> because all translations are flushed on nested transitions anyway.
> 
> In preparation for stopping unconditional flushes on nested transitions,
> handle INVPLGA interception properly. If L1 specified zero as the ASID,
> this is equivalent to INVLPG, so handle it as such. Otherwise, use
> INVPLGA to flush the translations of the appropriate ASID tracked by
> KVM, if any. Sync the shadow MMU as well, as L1 invalidated L2's
> mappings.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/mmu/mmu.c          |  5 +++--
>  arch/x86/kvm/svm/svm.c          | 36 +++++++++++++++++++++++++++++++--
>  3 files changed, 39 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d881e7d276b12..a158d324168a0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2237,6 +2237,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
>  		       void *insn, int insn_len);
>  void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg);
>  void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
> +void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> +			       u64 addr, unsigned long roots, bool gva_flush);
>  void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  			     u64 addr, unsigned long roots);
>  void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e2b1994f12753..d3baa12df84e7 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6355,8 +6355,8 @@ static void kvm_mmu_invalidate_addr_in_root(struct kvm_vcpu *vcpu,
>  	write_unlock(&vcpu->kvm->mmu_lock);
>  }
>  
> -static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> -				      u64 addr, unsigned long roots, bool gva_flush)
> +void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> +			       u64 addr, unsigned long roots, bool gva_flush)
>  {
>  	int i;
>  
> @@ -6382,6 +6382,7 @@ static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu
>  			kvm_mmu_invalidate_addr_in_root(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
>  	}
>  }
> +EXPORT_SYMBOL_GPL(__kvm_mmu_invalidate_addr);
>  
>  void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  			     u64 addr, unsigned long roots)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 3649707c61d3e..4b95fd6b501e6 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2505,6 +2505,7 @@ static int clgi_interception(struct kvm_vcpu *vcpu)
>  
>  static int invlpga_interception(struct kvm_vcpu *vcpu)
>  {
> +	struct vcpu_svm *svm = to_svm(vcpu);
>  	gva_t gva = kvm_rax_read(vcpu);
>  	u32 asid = kvm_rcx_read(vcpu);
>  
> @@ -2514,8 +2515,39 @@ static int invlpga_interception(struct kvm_vcpu *vcpu)
>  
>  	trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);
>  
> -	/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
> -	kvm_mmu_invlpg(vcpu, gva);
> +	/*
> +	 * APM is silent about using INVLPGA to flush the host ASID (i.e. 0).
> +	 * Do the logical thing and handle it like INVLPG.
> +	 */
> +	if (asid == 0) {
> +		kvm_mmu_invlpg(vcpu, gva);
> +		return kvm_skip_emulated_instruction(vcpu);
> +	}
> +
> +	/*
> +	 * Check if L1 specified the L2 ASID we are currently tracking. If it
> +	 * isn't, do nothing as we have to handle the TLB flush when switching
> +	 * to the new ASID anyway.
> +	 */
> +	if (asid == svm->nested.last_asid)
> +		invlpga(gva, svm_nested_asid(vcpu->kvm));
> +
> +	/*
> +	 * If NPT is disabled, sync the shadow page tables as L1 is invalidating
> +	 * mappings for L2. Sync all roots as ASIDs are not tracked in the MMU
> +	 * role.
> +	 *
> +	 * As we are not flushing the current context, skip the gva flush from
> +	 * __kvm_mmu_invalidate_addr(), it would flush the wrong ASID anyway.
> +	 * The correct TLB flush was done above (if needed).
> +	 *
> +	 * This always operates on root_mmu because L1 and L2 share an MMU when
> +	 * NPT is disabled. This can be optimized by invalidating guest roots
> +	 * only.
> +	 */
> +	if (!npt_enabled)
> +		__kvm_mmu_invalidate_addr(vcpu, &vcpu->arch.root_mmu, gva,
> +					  KVM_MMU_ROOTS_ALL, false);
>  
>  	return kvm_skip_emulated_instruction(vcpu);
>  }

Looks good.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 23/24] KVM: nSVM: Allocate a new ASID for nested guests
  2025-03-26 19:44   ` [RFC PATCH 23/24] KVM: nSVM: Allocate a new ASID for nested guests Yosry Ahmed
@ 2025-04-03 20:11     ` Maxim Levitsky
  2025-04-22 10:01       ` Yosry Ahmed
  0 siblings, 1 reply; 58+ messages in thread
From: Maxim Levitsky @ 2025-04-03 20:11 UTC (permalink / raw)
  To: Yosry Ahmed, Sean Christopherson
  Cc: Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov, Rik van Riel,
	Tom Lendacky, x86, kvm, linux-kernel

On Wed, 2025-03-26 at 19:44 +0000, Yosry Ahmed wrote:
> Now that nested TLB flushes are properly tracked, start allocating a
> separate ASID for nested guests. This allows dropping the unconditional
> TLB flushes on nested transitions and doing finer grained TLB flushing
> when necessary.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/nested.c | 11 +++++++++--
>  arch/x86/kvm/svm/svm.c    |  5 +++--
>  arch/x86/kvm/svm/svm.h    |  3 +++
>  3 files changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index 544913461693c..0c887c91bd50d 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -1204,6 +1204,7 @@ int svm_allocate_nested(struct vcpu_svm *svm)
>  {
>  	struct kvm_svm *kvm_svm = to_kvm_svm(svm->vcpu.kvm);
>  	struct page *vmcb02_page;
> +	unsigned int asid;
>  
>  	if (svm->nested.initialized)
>  		return 0;
> @@ -1221,8 +1222,14 @@ int svm_allocate_nested(struct vcpu_svm *svm)
>  
>  	svm->nested.initialized = true;
>  
> -	if (!kvm_svm->nested_asid)
> -		kvm_svm->nested_asid = kvm_svm->asid;
> +	if (!kvm_svm->nested_asid) {
> +		asid = kvm_tlb_tags_alloc(&svm_asids);
> +		if (asid && !svm_register_asid(asid)) {
> +			kvm_tlb_tags_free(&svm_asids, asid);
> +			asid = 0;
> +		}
> +		kvm_svm->nested_asid = asid ?: fallback_asid;
> +	}

Nitpick: AFAIK at least nested KVM doesn't enable EFER.SVME,
unless it actually runs a guest thus most of the time we will waste a ASID on a VM
which once did run a VM nested and since then doesn't run anything else.

So maybe we want to free the nested ASID in the svm_free_nested?

>  
>  	return 0;
>  
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 4b95fd6b501e6..196f5bca57a0e 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -249,8 +249,8 @@ static unsigned long iopm_base;
>  
>  DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
>  
> -static struct kvm_tlb_tags svm_asids;
> -static unsigned int fallback_asid;
> +struct kvm_tlb_tags svm_asids;
> +unsigned int fallback_asid;
>  
>  /*
>   * Only MSR_TSC_AUX is switched via the user return hook.  EFER is switched via
> @@ -5127,6 +5127,7 @@ static void svm_vm_destroy(struct kvm *kvm)
>  	avic_vm_destroy(kvm);
>  	sev_vm_destroy(kvm);
>  	kvm_tlb_tags_free(&svm_asids, kvm_svm->asid);
> +	kvm_tlb_tags_free(&svm_asids, kvm_svm->nested_asid);
>  }
>  
>  static int svm_vm_init(struct kvm *kvm)
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 0c44133bc05ca..220d10d2b1a5c 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -630,6 +630,9 @@ static inline void svm_vmgexit_no_action(struct vcpu_svm *svm, u64 data)
>  
>  extern bool dump_invalid_vmcb;
>  
> +extern struct kvm_tlb_tags svm_asids;
> +extern unsigned int fallback_asid;
> +
>  u32 svm_msrpm_offset(u32 msr);
>  u32 *svm_vcpu_alloc_msrpm(void);
>  void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm);


Best regards,
	Maxim Levitsky




^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 06/24] KVM: SEV: Track ASID->vCPU instead of ASID->VMCB
  2025-04-03 20:04   ` Maxim Levitsky
@ 2025-04-22  9:41     ` Yosry Ahmed
  0 siblings, 0 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-04-22  9:41 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Sean Christopherson, Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel

On Thu, Apr 03, 2025 at 04:04:05PM -0400, Maxim Levitsky wrote:
> On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> > SEV currently tracks the ASID to VMCB mapping for each physical CPU.
> > This is required to flush the ASID when a new VMCB using the same ASID
> > is run on the same CPU. 
> 
> 
> > Practically, there is a single VMCB for each
> > vCPU using SEV. 
> 
> Can you elaborate on this a bit? AFAIK you can't run nested with SEV,
> even plain SEV because guest state is encrypted, so for SEV we have
> indeed one VMCB per vCPU.

This is my understanding as well, will elaborate when I get around to
respinning.

> 
> > Furthermore, TLB flushes on nested transitions between
> > VMCB01 and VMCB02 are handled separately (see
> > nested_svm_transition_tlb_flush()).
> 
> Yes, or we can say that for now both VMCBs share the same ASID,
> up until later in this patch series.
> 
> > 
> > In preparation for generalizing the tracking and making the tracking
> > more expensive, start tracking the ASID to vCPU mapping instead. This
> > will allow for the tracking to be moved to a cheaper code path when
> > vCPUs are switched.
> > 
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> >  arch/x86/kvm/svm/sev.c | 12 ++++++------
> >  arch/x86/kvm/svm/svm.c |  2 +-
> >  arch/x86/kvm/svm/svm.h |  4 ++--
> >  3 files changed, 9 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index d613f81addf1c..ddb4d5b211ed7 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -240,7 +240,7 @@ static void sev_asid_free(struct kvm_sev_info *sev)
> >  
> >  	for_each_possible_cpu(cpu) {
> >  		sd = per_cpu_ptr(&svm_data, cpu);
> > -		sd->sev_vmcbs[sev->asid] = NULL;
> > +		sd->sev_vcpus[sev->asid] = NULL;
> >  	}
> >  
> >  	mutex_unlock(&sev_bitmap_lock);
> > @@ -3081,8 +3081,8 @@ int sev_cpu_init(struct svm_cpu_data *sd)
> >  	if (!sev_enabled)
> >  		return 0;
> >  
> > -	sd->sev_vmcbs = kcalloc(nr_asids, sizeof(void *), GFP_KERNEL);
> > -	if (!sd->sev_vmcbs)
> > +	sd->sev_vcpus = kcalloc(nr_asids, sizeof(void *), GFP_KERNEL);
> > +	if (!sd->sev_vcpus)
> >  		return -ENOMEM;
> >  
> >  	return 0;
> > @@ -3471,14 +3471,14 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
> >  	/*
> >  	 * Flush guest TLB:
> >  	 *
> > -	 * 1) when different VMCB for the same ASID is to be run on the same host CPU.
> > +	 * 1) when different vCPU for the same ASID is to be run on the same host CPU.
> >  	 * 2) or this VMCB was executed on different host CPU in previous VMRUNs.
> >  	 */
> > -	if (sd->sev_vmcbs[asid] == svm->vmcb &&
> > +	if (sd->sev_vcpus[asid] == &svm->vcpu &&
> >  	    svm->vcpu.arch.last_vmentry_cpu == cpu)
> >  		return 0;
> >  
> > -	sd->sev_vmcbs[asid] = svm->vmcb;
> > +	sd->sev_vcpus[asid] = &svm->vcpu;
> >  	vmcb_set_flush_asid(svm->vmcb);
> >  	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
> >  	return 0;
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 18bfc3d3f9ba1..1156ca97fd798 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -694,7 +694,7 @@ static void svm_cpu_uninit(int cpu)
> >  	if (!sd->save_area)
> >  		return;
> >  
> > -	kfree(sd->sev_vmcbs);
> > +	kfree(sd->sev_vcpus);
> >  	__free_page(__sme_pa_to_page(sd->save_area_pa));
> >  	sd->save_area_pa = 0;
> >  	sd->save_area = NULL;
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index 843a29a6d150e..4ea6c61c3b048 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -340,8 +340,8 @@ struct svm_cpu_data {
> >  
> >  	struct vmcb *current_vmcb;
> >  
> > -	/* index = sev_asid, value = vmcb pointer */
> > -	struct vmcb **sev_vmcbs;
> > +	/* index = sev_asid, value = vcpu pointer */
> > +	struct kvm_vcpu **sev_vcpus;
> >  };
> >  
> >  DECLARE_PER_CPU(struct svm_cpu_data, svm_data);
> 
> 
> Code itself looks OK, so 
> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Thanks!

> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 09/24] KVM: SEV: Generalize tracking ASID->vCPU with xarrays
  2025-04-03 20:05   ` Maxim Levitsky
@ 2025-04-22  9:50     ` Yosry Ahmed
  0 siblings, 0 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-04-22  9:50 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Sean Christopherson, Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel

On Thu, Apr 03, 2025 at 04:05:12PM -0400, Maxim Levitsky wrote:
> On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> > Following changes will track ASID to vCPU mappings for all ASIDs, not
> > just SEV ASIDs. Using per-CPU arrays with the maximum possible number of
> > ASIDs would be too expensive.
> 
> Maybe add a word or two to explain that currently # of SEV ASIDS is small
> but # of all ASIDS is relatively large (like 16 bit number or so)?

Good idea.

> > @@ -1573,13 +1567,13 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >  	if (sev_guest(vcpu->kvm)) {
> >  		/*
> >  		 * Flush the TLB when a different vCPU using the same ASID is
> > -		 * run on the same CPU.
> > +		 * run on the same CPU. xa_store() should always succeed because
> > +		 * the entry is reserved when the ASID is allocated.
> >  		 */
> >  		asid = sev_get_asid(vcpu->kvm);
> > -		if (sd->sev_vcpus[asid] != vcpu) {
> > -			sd->sev_vcpus[asid] = vcpu;
> > +		prev = xa_store(&sd->asid_vcpu, asid, vcpu, GFP_ATOMIC);
> > +		if (prev != vcpu || WARN_ON_ONCE(xa_err(prev)))
> 
> Tiny nitpick: I would have prefered to have WARN_ON_ONCE(xa_err(prev) first in the above condition,
> because in theory we shouldn't use a value before we know its not an error,
> but in practice this doesn't really matter.

I think it's fine because we are just comparing 'prev' to the vCPU
pointer we have, we are not dereferencing it. So it should be safe. I'd
rather only check the error condition last because it shouldn't ever
happen.

> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index 3ab2a424992c1..4929b96d3d700 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -340,8 +340,7 @@ struct svm_cpu_data {
> >  
> >  	struct vmcb *current_vmcb;
> >  
> > -	/* index = sev_asid, value = vcpu pointer */
> Maybe keep the above comment?

I think it's kinda pointless tbh because it's obvious from how the
xarray is used, but I am fine with keeping it if others agree it's
useful.

> 
> > -	struct kvm_vcpu **sev_vcpus;
> > +	struct xarray asid_vcpu;
> >  };
> >  
> >  DECLARE_PER_CPU(struct svm_cpu_data, svm_data);
> > @@ -655,6 +654,8 @@ void set_msr_interception(struct kvm_vcpu *vcpu, u32 *msrpm, u32 msr,
> >  void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool disable);
> >  void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
> >  				     int trig_mode, int vec);
> > +bool svm_register_asid(unsigned int asid);
> > +void svm_unregister_asid(unsigned int asid);
> >  
> >  /* nested.c */
> >  
> > @@ -793,7 +794,6 @@ void sev_vm_destroy(struct kvm *kvm);
> >  void __init sev_set_cpu_caps(void);
> >  void __init sev_hardware_setup(void);
> >  void sev_hardware_unsetup(void);
> > -int sev_cpu_init(struct svm_cpu_data *sd);
> >  int sev_dev_get_attr(u32 group, u64 attr, u64 *val);
> >  extern unsigned int max_sev_asid;
> >  void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code);
> > @@ -817,7 +817,6 @@ static inline void sev_vm_destroy(struct kvm *kvm) {}
> >  static inline void __init sev_set_cpu_caps(void) {}
> >  static inline void __init sev_hardware_setup(void) {}
> >  static inline void sev_hardware_unsetup(void) {}
> > -static inline int sev_cpu_init(struct svm_cpu_data *sd) { return 0; }
> >  static inline int sev_dev_get_attr(u32 group, u64 attr, u64 *val) { return -ENXIO; }
> >  #define max_sev_asid 0
> >  static inline void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) {}
> 
> 
> Overall looks good to me.
> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Thanks!

> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 10/24] KVM: SVM: Use a single ASID per VM
  2025-04-03 20:05   ` Maxim Levitsky
@ 2025-04-22  9:51     ` Yosry Ahmed
  0 siblings, 0 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-04-22  9:51 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Sean Christopherson, Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel

On Thu, Apr 03, 2025 at 04:05:41PM -0400, Maxim Levitsky wrote:
> On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
[..]
> > @@ -5481,6 +5498,12 @@ static __init int svm_hardware_setup(void)
> >  			goto err;
> >  	}
> >  
> > +	fallback_asid = kvm_tlb_tags_alloc(&svm_asids);
> > +	WARN_ON_ONCE(!fallback_asid);
> 
> Nitpick: This really can't happen unless there is some very bad bug lurking somewhere.
> And if this happens, nothing will work, since likely that regular ASID allocation
> will fail too.
> 
> So why not to fail svm_hardware_setup instead?

Yeah I can do that.

> 
> 
> > +
> > +	/* Needs to be after svm_cpu_init() initializes the per-CPU xarrays */
> > +	svm_register_asid(fallback_asid);
> > +
> >  	enable_apicv = avic = avic && avic_hardware_setup();
> >  
> >  	if (!enable_apicv) {
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index 4929b96d3d700..436b7e83141b9 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -117,6 +117,8 @@ struct kvm_sev_info {
> >  struct kvm_svm {
> >  	struct kvm kvm;
> >  
> > +	unsigned int asid;
> > +
> >  	/* Struct members for AVIC */
> >  	u32 avic_vm_id;
> >  	struct page *avic_logical_id_table_page;
> > @@ -132,7 +134,6 @@ struct kvm_vmcb_info {
> >  	struct vmcb *ptr;
> >  	unsigned long pa;
> >  	int cpu;
> > -	uint64_t asid_generation;
> >  };
> >  
> >  struct vmcb_save_area_cached {
> > @@ -247,7 +248,6 @@ struct vcpu_svm {
> >  	struct vmcb *vmcb;
> >  	struct kvm_vmcb_info vmcb01;
> >  	struct kvm_vmcb_info *current_vmcb;
> > -	u32 asid;
> >  	u32 sysenter_esp_hi;
> >  	u32 sysenter_eip_hi;
> >  	uint64_t tsc_aux;
> > @@ -330,11 +330,6 @@ struct vcpu_svm {
> >  };
> >  
> >  struct svm_cpu_data {
> > -	u64 asid_generation;
> > -	u32 max_asid;
> > -	u32 next_asid;
> > -	u32 min_asid;
> > -
> >  	struct vmcb *save_area;
> >  	unsigned long save_area_pa;
> >  
> > @@ -656,6 +651,7 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
> >  				     int trig_mode, int vec);
> >  bool svm_register_asid(unsigned int asid);
> >  void svm_unregister_asid(unsigned int asid);
> > +unsigned int svm_asid(struct kvm *kvm);
> >  
> >  /* nested.c */
> >  
> 
> Overall looks good to me.
> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Thanks!

> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 23/24] KVM: nSVM: Allocate a new ASID for nested guests
  2025-04-03 20:11     ` Maxim Levitsky
@ 2025-04-22 10:01       ` Yosry Ahmed
  0 siblings, 0 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-04-22 10:01 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Sean Christopherson, Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel

On Thu, Apr 03, 2025 at 04:11:47PM -0400, Maxim Levitsky wrote:
> On Wed, 2025-03-26 at 19:44 +0000, Yosry Ahmed wrote:
> > Now that nested TLB flushes are properly tracked, start allocating a
> > separate ASID for nested guests. This allows dropping the unconditional
> > TLB flushes on nested transitions and doing finer grained TLB flushing
> > when necessary.
> > 
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> >  arch/x86/kvm/svm/nested.c | 11 +++++++++--
> >  arch/x86/kvm/svm/svm.c    |  5 +++--
> >  arch/x86/kvm/svm/svm.h    |  3 +++
> >  3 files changed, 15 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 544913461693c..0c887c91bd50d 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -1204,6 +1204,7 @@ int svm_allocate_nested(struct vcpu_svm *svm)
> >  {
> >  	struct kvm_svm *kvm_svm = to_kvm_svm(svm->vcpu.kvm);
> >  	struct page *vmcb02_page;
> > +	unsigned int asid;
> >  
> >  	if (svm->nested.initialized)
> >  		return 0;
> > @@ -1221,8 +1222,14 @@ int svm_allocate_nested(struct vcpu_svm *svm)
> >  
> >  	svm->nested.initialized = true;
> >  
> > -	if (!kvm_svm->nested_asid)
> > -		kvm_svm->nested_asid = kvm_svm->asid;
> > +	if (!kvm_svm->nested_asid) {
> > +		asid = kvm_tlb_tags_alloc(&svm_asids);
> > +		if (asid && !svm_register_asid(asid)) {
> > +			kvm_tlb_tags_free(&svm_asids, asid);
> > +			asid = 0;
> > +		}
> > +		kvm_svm->nested_asid = asid ?: fallback_asid;
> > +	}
> 
> Nitpick: AFAIK at least nested KVM doesn't enable EFER.SVME,
> unless it actually runs a guest thus most of the time we will waste a ASID on a VM
> which once did run a VM nested and since then doesn't run anything else.

Oh yeah, I missed that, thanks. Will do.

> 
> So maybe we want to free the nested ASID in the svm_free_nested?
> 
> >  
> >  	return 0;
> >  
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 4b95fd6b501e6..196f5bca57a0e 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -249,8 +249,8 @@ static unsigned long iopm_base;
> >  
> >  DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
> >  
> > -static struct kvm_tlb_tags svm_asids;
> > -static unsigned int fallback_asid;
> > +struct kvm_tlb_tags svm_asids;
> > +unsigned int fallback_asid;
> >  
> >  /*
> >   * Only MSR_TSC_AUX is switched via the user return hook.  EFER is switched via
> > @@ -5127,6 +5127,7 @@ static void svm_vm_destroy(struct kvm *kvm)
> >  	avic_vm_destroy(kvm);
> >  	sev_vm_destroy(kvm);
> >  	kvm_tlb_tags_free(&svm_asids, kvm_svm->asid);
> > +	kvm_tlb_tags_free(&svm_asids, kvm_svm->nested_asid);
> >  }
> >  
> >  static int svm_vm_init(struct kvm *kvm)
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index 0c44133bc05ca..220d10d2b1a5c 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -630,6 +630,9 @@ static inline void svm_vmgexit_no_action(struct vcpu_svm *svm, u64 data)
> >  
> >  extern bool dump_invalid_vmcb;
> >  
> > +extern struct kvm_tlb_tags svm_asids;
> > +extern unsigned int fallback_asid;
> > +
> >  u32 svm_msrpm_offset(u32 msr);
> >  u32 *svm_vcpu_alloc_msrpm(void);
> >  void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm);
> 
> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 13/24] KVM: nSVM: Parameterize svm_flush_tlb_asid() by is_guest_mode
  2025-04-03 20:10   ` Maxim Levitsky
@ 2025-04-22 10:04     ` Yosry Ahmed
  0 siblings, 0 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-04-22 10:04 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Sean Christopherson, Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel

On Thu, Apr 03, 2025 at 04:10:06PM -0400, Maxim Levitsky wrote:
> On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> > svm_flush_tlb_asid() currently operates on the current VMCB. In
> > preparation for properly tracking TLB flushes for L1 and L2 ASIDs,
> > refactor it to take is_guest_mode and find the proper VMCB. All existing
> > callers pass is_guest_mode(vcpu) to maintain existing behavior for now.
> > 
> > Move the comment about only flushing the current ASID to
> > svm_flush_tlb_all(), where it probably should have been anyway, because
> > svm_flush_tlb_asid() now flushes a given ASID, not the current ASID.
> > 
> > Create a svm_flush_tlb_guest() wrapper to use as the flush_tlb_guest()
> > callback.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> >  arch/x86/kvm/svm/svm.c | 39 +++++++++++++++++++++++++--------------
> >  1 file changed, 25 insertions(+), 14 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 865c5ce4fa473..fb6b9f88a1504 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -4016,25 +4016,24 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
> >  	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
> >  }
> >  
> > -static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
> > +static struct vmcb *svm_get_vmcb(struct vcpu_svm *svm, bool is_guest_mode)
> > +{
> > +	return is_guest_mode ? svm->nested.vmcb02.ptr : svm->vmcb01.ptr;
> > +}
> 
> Not sure 100% about this helper, it name might be a bit confusing because
> we already have a current vmcb. Maybe add a comment above stating this
> this is to get vmcb which might not be currently active?

Yeah I spent some time trying to come up with an elaborate name then
convinced myself that the is_guest_mode parameter will make it clear
that the caller specifies which VMCB it wants, regardless of which VMCB
is current.

I can add a comment to make it clearer.

> 
> > +
> > +static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu, bool is_guest_mode)
> >  {
> >  	struct vcpu_svm *svm = to_svm(vcpu);
> > +	struct vmcb *vmcb = svm_get_vmcb(svm, is_guest_mode);
> >  
> >  	/*
> >  	 * Unlike VMX, SVM doesn't provide a way to flush only NPT TLB entries.
> >  	 * A TLB flush for the current ASID flushes both "host" and "guest" TLB
> >  	 * entries, and thus is a superset of Hyper-V's fine grained flushing.
> >  	 */
> > -	kvm_hv_vcpu_purge_flush_tlb(vcpu, is_guest_mode(vcpu));
> > -
> > -	/*
> > -	 * Flush only the current ASID even if the TLB flush was invoked via
> > -	 * kvm_flush_remote_tlbs().  Although flushing remote TLBs requires all
> > -	 * ASIDs to be flushed, KVM uses a single ASID for L1 and L2, and
> > -	 * unconditionally does a TLB flush on both nested VM-Enter and nested
> > -	 * VM-Exit (via kvm_mmu_reset_context()).
> > -	 */
> > -	vmcb_set_flush_asid(svm->vmcb);
> > +	kvm_hv_vcpu_purge_flush_tlb(vcpu, is_guest_mode);
> > +	if (vmcb)
> > +		vmcb_set_flush_asid(vmcb);
> >  }
> >  
> >  static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> > @@ -4050,7 +4049,7 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
> >  	if (svm_hv_is_enlightened_tlb_enabled(vcpu) && VALID_PAGE(root_tdp))
> >  		hyperv_flush_guest_mapping(root_tdp);
> >  
> > -	svm_flush_tlb_asid(vcpu);
> > +	svm_flush_tlb_asid(vcpu, is_guest_mode(vcpu));
> >  }
> >  
> >  static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
> > @@ -4065,7 +4064,14 @@ static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
> >  	if (WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
> >  		hv_flush_remote_tlbs(vcpu->kvm);
> >  
> > -	svm_flush_tlb_asid(vcpu);
> > +	/*
> > +	 * Flush only the current ASID even if the TLB flush was invoked via
> > +	 * kvm_flush_remote_tlbs().  Although flushing remote TLBs requires all
> > +	 * ASIDs to be flushed, KVM uses a single ASID for L1 and L2, and
> > +	 * unconditionally does a TLB flush on both nested VM-Enter and nested
> > +	 * VM-Exit (via kvm_mmu_reset_context()).
> > +	 */
> > +	svm_flush_tlb_asid(vcpu, is_guest_mode(vcpu));
> >  }
> >  
> >  static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
> > @@ -4075,6 +4081,11 @@ static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
> >  	invlpga(gva, svm_get_current_asid(svm));
> >  }
> >  
> > +static void svm_flush_tlb_guest(struct kvm_vcpu *vcpu)
> > +{
> > +	svm_flush_tlb_asid(vcpu, is_guest_mode(vcpu));
> > +}
> > +
> >  static inline void sync_cr8_to_lapic(struct kvm_vcpu *vcpu)
> >  {
> >  	struct vcpu_svm *svm = to_svm(vcpu);
> > @@ -5187,7 +5198,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
> >  	.flush_tlb_all = svm_flush_tlb_all,
> >  	.flush_tlb_current = svm_flush_tlb_current,
> >  	.flush_tlb_gva = svm_flush_tlb_gva,
> > -	.flush_tlb_guest = svm_flush_tlb_asid,
> > +	.flush_tlb_guest = svm_flush_tlb_guest,
> >  
> >  	.vcpu_pre_run = svm_vcpu_pre_run,
> >  	.vcpu_run = svm_vcpu_run,
> 
> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Thanks!

> 
> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 11/24] KVM: nSVM: Use a separate ASID for nested guests
  2025-04-03 20:09   ` Maxim Levitsky
@ 2025-04-22 10:08     ` Yosry Ahmed
  0 siblings, 0 replies; 58+ messages in thread
From: Yosry Ahmed @ 2025-04-22 10:08 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Sean Christopherson, Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel

On Thu, Apr 03, 2025 at 04:09:30PM -0400, Maxim Levitsky wrote:
> On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> > The per-VM ASID is currently shared by both L1 and L2 guests. That ASID
> > is currently flushed on every transition between L1 and L2.
> > 
> > Allocate and track a separate ASID per-VM for nested guests. This is in
> > preparation for doing fine-grained TLB flushes on nested transitions
> > instead of unconditional full flushes.
> > 
> > Nested ASIDs are still not fully maintained (e.g. a remote flush will
> > only flush the current ASID), so keep the TLB flush on every transition
> > until this is sorted out in following changes.
> > 
> > Add a helper to get the ASID associated with a specific VMCB and use it
> > instead of directly reading the VM's ASID. This transparently uses L2's
> > ASID when an L2 guest is being run.
> > 
> > L1's ASID is flushed on KVM_REQ_TLB_FLUSH_GUEST if it is the active
> > context, so remove the TODO in nested_svm_transition_tlb_flush() about
> > it.
> > 
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> >  arch/x86/kvm/svm/nested.c |  8 ++++++--
> >  arch/x86/kvm/svm/svm.c    | 13 +++++++++++--
> >  arch/x86/kvm/svm/svm.h    |  3 ++-
> >  3 files changed, 19 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 81184b2fb27fd..75223869aa8c6 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -495,7 +495,6 @@ static void nested_svm_transition_tlb_flush(struct kvm_vcpu *vcpu)
> >  	 *  - Honor L1's request to flush an ASID on nested VMRUN
> >  	 *  - Sync nested NPT MMU on VMRUN that flushes L2's ASID[*]
> >  	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
> > -	 *  - Flush L1's ASID on KVM_REQ_TLB_FLUSH_GUEST
> >  	 *
> >  	 * [*] Unlike nested EPT, SVM's ASID management can invalidate nested
> >  	 *     NPT guest-physical mappings on VMRUN.
> > @@ -677,7 +676,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
> >  	vmcb02->control.nested_ctl = vmcb01->control.nested_ctl;
> >  	vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa;
> >  	vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa;
> > -	vmcb02->control.asid = svm_asid(vcpu->kvm);
> > +	vmcb02->control.asid = svm_nested_asid(vcpu->kvm);
> >  
> >  	/* Also overwritten later if necessary.  */
> >  	vmcb_clr_flush_asid(vmcb02);
> > @@ -1179,6 +1178,7 @@ static void nested_svm_triple_fault(struct kvm_vcpu *vcpu)
> >  
> >  int svm_allocate_nested(struct vcpu_svm *svm)
> >  {
> > +	struct kvm_svm *kvm_svm = to_kvm_svm(svm->vcpu.kvm);
> >  	struct page *vmcb02_page;
> >  
> >  	if (svm->nested.initialized)
> > @@ -1196,6 +1196,10 @@ int svm_allocate_nested(struct vcpu_svm *svm)
> >  	svm_vcpu_init_msrpm(&svm->vcpu, svm->nested.msrpm);
> >  
> >  	svm->nested.initialized = true;
> > +
> > +	if (!kvm_svm->nested_asid)
> > +		kvm_svm->nested_asid = kvm_svm->asid;
> 
> Nitpick: maybe put nested_asid into .nested struct as well?
> I don't have a strong option on this, feel free to leave it where it is now.

I did this initially but I thought created a struct just for the purpose
of holding the nested ASID would be an overkill, but I don't feel
strongly.

> 
> 
> > +
> >  	return 0;
> >  
> >  err_free_vmcb02:
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index f028d006f69dc..e664d8428c792 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -1225,17 +1225,26 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
> >  	}
> >  }
> >  
> > -unsigned int svm_asid(struct kvm *kvm)
> > +unsigned int svm_nested_asid(struct kvm *kvm)
> > +{
> > +	return to_kvm_svm(kvm)->nested_asid;
> > +}
> 
> It might also make sense to add WARN_ON_ONCE(!svm->nested.initialized) here, just in case.

Yeah we can do that, but I will check the callers first to make sure
there's no chance of false positives.

> 
> > +
> > +static unsigned int svm_asid(struct kvm *kvm)
> >  {
> >  	return to_kvm_svm(kvm)->asid;
> >  }
> >  
> >  static unsigned int svm_get_current_asid(struct vcpu_svm *svm)
> >  {
> > -	struct kvm *kvm = svm->vcpu.kvm;
> > +	struct kvm_vcpu *vcpu = &svm->vcpu;
> > +	struct kvm *kvm = vcpu->kvm;
> >  
> >  	if (sev_guest(kvm))
> >  		return sev_get_asid(kvm);
> > +	if (is_guest_mode(vcpu))
> > +		return svm_nested_asid(kvm);
> > +	WARN_ON_ONCE(svm->current_vmcb != &svm->vmcb01);
> >  	return svm_asid(kvm);
> >  }
> >  
> > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > index 436b7e83141b9..e67e3a64e92f7 100644
> > --- a/arch/x86/kvm/svm/svm.h
> > +++ b/arch/x86/kvm/svm/svm.h
> > @@ -118,6 +118,7 @@ struct kvm_svm {
> >  	struct kvm kvm;
> >  
> >  	unsigned int asid;
> > +	unsigned int nested_asid;
> >  
> >  	/* Struct members for AVIC */
> >  	u32 avic_vm_id;
> > @@ -651,7 +652,7 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
> >  				     int trig_mode, int vec);
> >  bool svm_register_asid(unsigned int asid);
> >  void svm_unregister_asid(unsigned int asid);
> > -unsigned int svm_asid(struct kvm *kvm);
> > +unsigned int svm_nested_asid(struct kvm *kvm);
> >  
> >  /* nested.c */
> >  
> 
> 
> Overall looks good,
> 
> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Thanks!

> 
> Best regards,
> 	Maxim Levitsky
> 
> 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 06/24] KVM: SEV: Track ASID->vCPU instead of ASID->VMCB
  2025-03-26 19:36 ` [RFC PATCH 06/24] KVM: SEV: Track ASID->vCPU instead of ASID->VMCB Yosry Ahmed
  2025-04-03 20:04   ` Maxim Levitsky
@ 2025-06-20 23:13   ` Sean Christopherson
  2025-06-23 19:50     ` Tom Lendacky
  1 sibling, 1 reply; 58+ messages in thread
From: Sean Christopherson @ 2025-06-20 23:13 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel

On Wed, Mar 26, 2025, Yosry Ahmed wrote:
> SEV currently tracks the ASID to VMCB mapping for each physical CPU.
> This is required to flush the ASID when a new VMCB using the same ASID
> is run on the same CPU. Practically, there is a single VMCB for each
> vCPU using SEV. Furthermore, TLB flushes on nested transitions between
> VMCB01 and VMCB02 are handled separately (see
> nested_svm_transition_tlb_flush()).
> 
> In preparation for generalizing the tracking and making the tracking
> more expensive, start tracking the ASID to vCPU mapping instead. This
> will allow for the tracking to be moved to a cheaper code path when
> vCPUs are switched.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/svm/sev.c | 12 ++++++------
>  arch/x86/kvm/svm/svm.c |  2 +-
>  arch/x86/kvm/svm/svm.h |  4 ++--
>  3 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index d613f81addf1c..ddb4d5b211ed7 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -240,7 +240,7 @@ static void sev_asid_free(struct kvm_sev_info *sev)
>  
>  	for_each_possible_cpu(cpu) {
>  		sd = per_cpu_ptr(&svm_data, cpu);
> -		sd->sev_vmcbs[sev->asid] = NULL;
> +		sd->sev_vcpus[sev->asid] = NULL;
>  	}
>  
>  	mutex_unlock(&sev_bitmap_lock);
> @@ -3081,8 +3081,8 @@ int sev_cpu_init(struct svm_cpu_data *sd)
>  	if (!sev_enabled)
>  		return 0;
>  
> -	sd->sev_vmcbs = kcalloc(nr_asids, sizeof(void *), GFP_KERNEL);
> -	if (!sd->sev_vmcbs)
> +	sd->sev_vcpus = kcalloc(nr_asids, sizeof(void *), GFP_KERNEL);
> +	if (!sd->sev_vcpus)
>  		return -ENOMEM;
>  
>  	return 0;
> @@ -3471,14 +3471,14 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
>  	/*
>  	 * Flush guest TLB:
>  	 *
> -	 * 1) when different VMCB for the same ASID is to be run on the same host CPU.
> +	 * 1) when different vCPU for the same ASID is to be run on the same host CPU.

Tom, can you clarity what an ASID actually tags when NPT is in use?

The more I think about all of this, the less it makes sense.  The *entire* point
of an ASID is to tag TLB entries so that a flush isn't required when running code
for the same address space.

The main problem I'm struggling with is that, as usual, the APM doesn't properly
document anything, and just gives "suggestions" for the VMM.  *sigh*

As I read it, these snippets from the APM are saying ASIDs tag only GPA=>PA entries
when NPT is in use.

  TLB entries are tagged with Address Space Identifier (ASID) bits to distinguish
  different guest virtual address spaces when shadow page tables are used, or
  different guest physical address spaces when nested page tables are used. The
  VMM can choose a software strategy in which it keeps multiple shadow page tables,
  and/or multiple nested page tables in processors that support nested paging,
  up-to-date; the VMM can allocate a different ASID for each shadow or nested
  page table. This allows switching to a new process in a guest under shadow
  paging (changing CR3 contents), or to a new guest under nested paging (changing
  nCR3 contents), without flushing the TLBs.

  Note that because an ASID is associated with the guest's physical address
  space, it is common across all of the guest's virtual address spaces within a
  processor. This differs from shadow page tables where ASIDs tag individual
  guest virtual address spaces. Note also that the same ASID may or may not be
  associated with the same address space across all processors in a
  multiprocessor system, for either nested tables or shadow tables; this depends
  on how the VMM manages ASID assignment.

But then the "15.16.1 TLB Flush" section says this, without any qualification
whatsoever that it applies only to shadow paging.

  A MOV-to-CR3, a task switch that changes CR3, or clearing or setting CR0.PG or
  bits PGE, PAE, PSE of CR4 affects only the TLB entries belonging to the current
  ASID, regardless of whether the operation occurred in host or guest mode. The
  current ASID is 0 when the CPU is not inside a guest context.

And honestly, tagging only GPA=>PA entries doesn't make any sense, because
GVA=>GPA needs to be tagged with *something*.  And the APM doesn't say anything
about caching GPA=>PA translations, only about caching VA=>PA.

The thing that doesn't fit is that SEV+ uses ASIDs on a per-VM basis.  I suggested
per-VM ASIDs for all VM types based solely on that fact, but now I'm wondering if
it's SEV+ that crazy and broken.  Because if ASIDs also tag GVA=>GPA, then SEV has
a massive architectural security hole, e.g. a malicious hypervisor can coerce the
CPU into using a stale GVA=>GPA TLB entry by switching vCPUs and letting guest
process with CR3=x access memory for guest process with CR3=y.  But again, if
ASIDs don't tag GVA=>GPA, then what provides isolation between vCPUs!?!?!

Assuming ASIDs tag VA=>PA (i.e. the combined GVA=>GPA=>PA translation), then I
take back my suggestion to use per-VM ASIDs, and instead propose we treat ASIDs
like VPIDs, i.e. use per-vCPU ASIDs (well, technically per-VMCB, because nested).
All server SKUs I've checked (Rome, Milan, Genoa, and Turin) support 0x8000 ASIDs.
That makes the ~512 ASIDs reserved for SEV+ a drop in the bucket, i.e. still leaves
~32k ASIDs up for grabs, which means KVM can concurrently run ~16k vCPUs *with*
nested VMs without having to fallback to flushing when scheduling in a new vCPU.

That way we don't need the complexity of the xarray ASID=>vCPU tracking for common
code.  And as a bonus, the logic for VMX vs. SVM is very, very similar.

Given the SEV+ uses the ASID as the handle for the guest's encryption key,
changing that behavior isn't an option.  Though I still don't see how that isn't
a security flaw.  Regardless, I definitely don't think it's something we should
follow.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 01/24] KVM: VMX: Generalize VPID allocation to be vendor-neutral
  2025-03-26 19:35 ` [RFC PATCH 01/24] KVM: VMX: Generalize VPID allocation to be vendor-neutral Yosry Ahmed
  2025-03-27 10:58   ` Nikunj A Dadhania
@ 2025-06-23 16:44   ` Sean Christopherson
  1 sibling, 0 replies; 58+ messages in thread
From: Sean Christopherson @ 2025-06-23 16:44 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel

On Wed, Mar 26, 2025, Yosry Ahmed wrote:
> Generalize the VMX VPID allocation code and make move it to common code
> in preparation for sharing with SVM. Create a generic struct
> kvm_tlb_tags, representing a factory for VPIDs (or ASIDs later), and use
> one for VPIDs.

I don't see any reason in creating a factory, just have common KVM provide the
structure.


> Most of the functionality remains the same, with the following
> differences:
> - The enable_vpid checks are moved to the callers for allocate_vpid()
>   and free_vpid(), as they are specific to VMX.
> - The bitmap allocation is now dynamic (which will be required for SVM),
>   so it is initialized and cleaned up in vmx_hardware_{setup/unsetup}().
> - The range of valid TLB tags is expressed in terms of min/max instead
>   of the number of tags to support SVM use cases.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/vmx/nested.c |  4 +--
>  arch/x86/kvm/vmx/vmx.c    | 38 +++++--------------------
>  arch/x86/kvm/vmx/vmx.h    |  4 +--
>  arch/x86/kvm/x86.c        | 58 +++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.h        | 13 +++++++++
>  5 files changed, 82 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index d06e50d9c0e79..b017bd2eb2382 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -343,7 +343,7 @@ static void free_nested(struct kvm_vcpu *vcpu)
>  	vmx->nested.vmxon = false;
>  	vmx->nested.smm.vmxon = false;
>  	vmx->nested.vmxon_ptr = INVALID_GPA;
> -	free_vpid(vmx->nested.vpid02);
> +	kvm_tlb_tags_free(&vmx_vpids, vmx->nested.vpid02);
>  	vmx->nested.posted_intr_nv = -1;
>  	vmx->nested.current_vmptr = INVALID_GPA;
>  	if (enable_shadow_vmcs) {
> @@ -5333,7 +5333,7 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu)
>  		     HRTIMER_MODE_ABS_PINNED);
>  	vmx->nested.preemption_timer.function = vmx_preemption_timer_fn;
>  
> -	vmx->nested.vpid02 = allocate_vpid();
> +	vmx->nested.vpid02 = enable_vpid ? kvm_tlb_tags_alloc(&vmx_vpids) : 0;

Since the tag allocator already needs to handle "no tag available", it should also
handle the "tagging" not enabled scenario.  That way this can simply be:

	vmx->nested.vpid02 = kvm_tlb_tags_alloc(&vmx_vpids);

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 69c20a68a3f01..182f18ebc62f3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13992,6 +13992,64 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
>  }
>  EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
>  
> +int kvm_tlb_tags_init(struct kvm_tlb_tags *tlb_tags, unsigned int min,
> +		      unsigned int max)

I'd much prefer we don't create a "kvm_tlb_tags" namespace, and insteaad go with:

  kvm_init_tlb_tags()
  kvm_alloc_tlb_tag()
  kvm_free_tlb_tag()

Because kvm_tlb_tags_alloc() in particular reads like it allocates *multiple*
tags.

I also think it's probably worth a typedef for the tag, mostly to help with
understanding what's being pased around, e.g.

typedef unsigned int kvm_tlb_tag_t;

void kvm_init_tlb_tags(kvm_tlb_tag_t min, kvm_tlb_tag_t max);
kvm_tlb_tag_t kvm_alloc_tlb_tag(void);
void kvm_free_tlb_tag(kvm_tlb_tag_t tag);

> +{
> +	/*
> +	 * 0 is assumed to be the host's TLB tag and is returned on failed

Not assumed, *is*.

> +	 * allocations.
> +	 */
> +	if (WARN_ON_ONCE(min == 0))
> +		return -1;
> +
> +	/*
> +	 * Allocate enough bits to index the bitmap directly by the tag,
> +	 * potentially wasting a bit of memory.
> +	 */
> +	tlb_tags->bitmap = bitmap_zalloc(max + 1, GFP_KERNEL);

Rather than blindly allocate SVM's theoretical max of 4 *billion* tags, I think
we should statically reserve space for 65k tags, i.e. for the max possible VMX
VPID.

As mentioned in a different reply, current AMD CPUs support 32k ASIDs, so in
practice it's not even a meaningful limit.  And I strongly suspect that pushing
past ~4k active vCPUs, let alone 32k vCPUs, will run into other bottlenecks long
before the number of ASIDs becomes problematic.

That way KVM doesn't need to bail on failure, or as is done for VMX, silently
disable VPID usage.

Untested, but this is what I'm thinking:

---
 arch/x86/kvm/mmu.h        |  6 ++++
 arch/x86/kvm/mmu/mmu.c    | 61 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/nested.c |  4 +--
 arch/x86/kvm/vmx/vmx.c    | 38 +++++-------------------
 arch/x86/kvm/vmx/vmx.h    |  2 --
 5 files changed, 76 insertions(+), 35 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index b4b6860ab971..9e7343722530 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -78,6 +78,12 @@ static inline gfn_t kvm_mmu_max_gfn(void)
 
 u8 kvm_mmu_get_max_tdp_level(void);
 
+typedef unsigned int kvm_tlb_tag_t;
+
+void kvm_init_tlb_tags(kvm_tlb_tag_t min, kvm_tlb_tag_t max);
+kvm_tlb_tag_t kvm_alloc_tlb_tag(void);
+void kvm_free_tlb_tag(kvm_tlb_tag_t tag);
+
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
 void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value);
 void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4e06e2e89a8f..e58d998ed10a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -121,6 +121,63 @@ static int max_tdp_level __read_mostly;
 
 #include <trace/events/kvm.h>
 
+#define KVM_MAX_TLB_TAG			0xffff
+
+struct kvm_tlb_tags {
+	spinlock_t	lock;
+	DECLARE_BITMAP(used, KVM_MAX_TLB_TAG + 1);
+	kvm_tlb_tag_t	min;
+	kvm_tlb_tag_t	max;
+};
+
+struct kvm_tlb_tags kvm_tlb_tags;
+
+void kvm_init_tlb_tags(kvm_tlb_tag_t min, kvm_tlb_tag_t max)
+{
+	/*
+	 * 0 is the host's TLB tag for both VMX's VPID and SVM's ASID, and is
+	 * returned on failed allocations, e.g. if there are no more tags left.
+	 */
+	if (WARN_ON_ONCE(!min || max < min))
+		return;
+
+	kvm_tlb_tags.min = min;
+	kvm_tlb_tags.max = min(max, KVM_MAX_TLB_TAG);
+}
+EXPORT_SYMBOL_GPL(kvm_init_tlb_tags);
+
+kvm_tlb_tag_t kvm_alloc_tlb_tag(void)
+{
+	struct kvm_tlb_tags *tags = &kvm_tlb_tags;
+	kvm_tlb_tag_t tag;
+
+	if (!kvm_tlb_tags.min)
+		return 0;
+
+	guard(spinlock)(&kvm_tlb_tags.lock);
+
+	tag = find_next_zero_bit(tags->used, tags->max + 1, tags->min);
+	if (tag > tags->max)
+		return 0;
+
+	__set_bit(tag, tags->used);
+	return tag;
+}
+EXPORT_SYMBOL_GPL(kvm_alloc_tlb_tag);
+
+void kvm_free_tlb_tag(kvm_tlb_tag_t tag)
+{
+	struct kvm_tlb_tags *tags = &kvm_tlb_tags;
+
+	if (!tag || WARN_ON_ONCE(tag < tags->min || tag > tags->max))
+		return;
+
+	guard(spinlock)(&tags->lock);
+
+	__clear_bit(tag, tags->used);
+}
+EXPORT_SYMBOL_GPL(kvm_free_tlb_tag);
+
 /* make pte_list_desc fit well in cache lines */
 #define PTE_LIST_EXT 14
 
@@ -7426,6 +7483,10 @@ int kvm_mmu_vendor_module_init(void)
 
 	kvm_mmu_reset_all_pte_masks();
 
+	kvm_tlb_tags.min = 0;
+	kvm_tlb_tags.max = 0;
+	bitmap_zero(kvm_tlb_tags.used, KVM_MAX_TLB_TAG + 1);
+
 	pte_list_desc_cache = KMEM_CACHE(pte_list_desc, SLAB_ACCOUNT);
 	if (!pte_list_desc_cache)
 		goto out;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7211c71d4241..7f02dbe196e3 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -344,7 +344,7 @@ static void free_nested(struct kvm_vcpu *vcpu)
 	vmx->nested.vmxon = false;
 	vmx->nested.smm.vmxon = false;
 	vmx->nested.vmxon_ptr = INVALID_GPA;
-	free_vpid(vmx->nested.vpid02);
+	kvm_tlb_tags_free(&vmx_vpids, vmx->nested.vpid02);
 	vmx->nested.posted_intr_nv = -1;
 	vmx->nested.current_vmptr = INVALID_GPA;
 	if (enable_shadow_vmcs) {
@@ -5333,7 +5333,7 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu)
 	hrtimer_setup(&vmx->nested.preemption_timer, vmx_preemption_timer_fn, CLOCK_MONOTONIC,
 		      HRTIMER_MODE_ABS_PINNED);
 
-	vmx->nested.vpid02 = allocate_vpid();
+	vmx->nested.vpid02 = kvm_alloc_tlb_tag();
 
 	vmx->nested.vmcs02_initialized = false;
 	vmx->nested.vmxon = true;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4953846cb30d..4f3d78e71461 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -501,8 +501,7 @@ DEFINE_PER_CPU(struct vmcs *, current_vmcs);
  */
 static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
 
-static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
-static DEFINE_SPINLOCK(vmx_vpid_lock);
+struct kvm_tlb_tags vmx_vpids;
 
 struct vmcs_config vmcs_config __ro_after_init;
 struct vmx_capability vmx_capability __ro_after_init;
@@ -3970,31 +3969,6 @@ static void seg_setup(int seg)
 	vmcs_write32(sf->ar_bytes, ar);
 }
 
-int allocate_vpid(void)
-{
-	int vpid;
-
-	if (!enable_vpid)
-		return 0;
-	spin_lock(&vmx_vpid_lock);
-	vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
-	if (vpid < VMX_NR_VPIDS)
-		__set_bit(vpid, vmx_vpid_bitmap);
-	else
-		vpid = 0;
-	spin_unlock(&vmx_vpid_lock);
-	return vpid;
-}
-
-void free_vpid(int vpid)
-{
-	if (!enable_vpid || vpid == 0)
-		return;
-	spin_lock(&vmx_vpid_lock);
-	__clear_bit(vpid, vmx_vpid_bitmap);
-	spin_unlock(&vmx_vpid_lock);
-}
-
 static void vmx_msr_bitmap_l01_changed(struct vcpu_vmx *vmx)
 {
 	/*
@@ -7480,7 +7454,7 @@ void vmx_vcpu_free(struct kvm_vcpu *vcpu)
 
 	if (enable_pml)
 		vmx_destroy_pml_buffer(vmx);
-	free_vpid(vmx->vpid);
+	kvm_tlb_tags_free(&vmx_vpids, vmx->vpid);
 	nested_vmx_free_vcpu(vcpu);
 	free_loaded_vmcs(vmx->loaded_vmcs);
 	free_page((unsigned long)vmx->ve_info);
@@ -7499,7 +7473,7 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 
 	err = -ENOMEM;
 
-	vmx->vpid = allocate_vpid();
+	vmx->vpid = kvm_alloc_tlb_tag();
 
 	/*
 	 * If PML is turned on, failure on enabling PML just results in failure
@@ -7602,7 +7576,7 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 free_pml:
 	vmx_destroy_pml_buffer(vmx);
 free_vpid:
-	free_vpid(vmx->vpid);
+	kvm_tlb_tags_free(&vmx_vpids, vmx->vpid);
 	return err;
 }
 
@@ -8522,7 +8496,9 @@ __init int vmx_hardware_setup(void)
 	kvm_caps.has_bus_lock_exit = cpu_has_vmx_bus_lock_detection();
 	kvm_caps.has_notify_vmexit = cpu_has_notify_vmexit();
 
-	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
+	/* VPID 0 is reserved for host, so min=1  */
+	if (enable_vpid)
+		kvm_init_tlb_tags(1, VMX_NR_VPIDS - 1);
 
 	if (enable_ept)
 		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index b5758c33c60f..5feec05de9b4 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -355,8 +355,6 @@ static __always_inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu)
 }
 
 void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu);
-int allocate_vpid(void);
-void free_vpid(int vpid);
 void vmx_set_constant_host_state(struct vcpu_vmx *vmx);
 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,

base-commit: ecff148f29dade8416abee4d492d2a7a6d7cd610
--

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 03/24] KVM: SVM: Add helpers to set/clear ASID flush in VMCB
  2025-03-26 19:35 ` [RFC PATCH 03/24] KVM: SVM: Add helpers to set/clear ASID flush in VMCB Yosry Ahmed
  2025-04-03 20:00   ` Maxim Levitsky
@ 2025-06-23 16:46   ` Sean Christopherson
  1 sibling, 0 replies; 58+ messages in thread
From: Sean Christopherson @ 2025-06-23 16:46 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel

On Wed, Mar 26, 2025, Yosry Ahmed wrote:
> Incoming changes will add more code paths that set tlb_ctl to
> TLB_CONTROL_FLUSH_ASID, and will eliminate the use of
> TLB_CONTROL_FLUSH_ALL_ASID except as fallback when FLUSHBYASID is not
> available. Introduce set/clear helpers to set tlb_ctl to
> TLB_CONTROL_FLUSH_ASID or TLB_CONTROL_DO_NOTHING.
> 
> Opportunistically move the TLB_CONTROL_* definitions to
> arch/x86/kvm/svm/svm.h as they are not used outside of arch/x86/kvm/svm/.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/include/asm/svm.h |  5 -----
>  arch/x86/kvm/svm/nested.c  |  2 +-
>  arch/x86/kvm/svm/sev.c     |  2 +-
>  arch/x86/kvm/svm/svm.c     |  4 ++--
>  arch/x86/kvm/svm/svm.h     | 15 +++++++++++++++
>  5 files changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 9b7fa99ae9513..a97da63562eb3 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -171,11 +171,6 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>  };
>  
>  
> -#define TLB_CONTROL_DO_NOTHING 0
> -#define TLB_CONTROL_FLUSH_ALL_ASID 1
> -#define TLB_CONTROL_FLUSH_ASID 3
> -#define TLB_CONTROL_FLUSH_ASID_LOCAL 7

These should stay in asm/svm.h as they are architectural definitions.  KVM's
headers are anything but organized, but my goal is to eventually have the asm/
headers hold most/all architectural definitions, while KVM's internal headers
hold KVM-internal stuff.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 12/24] KVM: x86: hyper-v: Pass is_guest_mode to kvm_hv_vcpu_purge_flush_tlb()
  2025-04-03 20:09   ` Maxim Levitsky
@ 2025-06-23 19:22     ` Sean Christopherson
  0 siblings, 0 replies; 58+ messages in thread
From: Sean Christopherson @ 2025-06-23 19:22 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Yosry Ahmed, Paolo Bonzini, Jim Mattson, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel

On Thu, Apr 03, 2025, Maxim Levitsky wrote:
> On Wed, 2025-03-26 at 19:36 +0000, Yosry Ahmed wrote:
> > Instead of calling is_guest_mode() inside kvm_hv_vcpu_purge_flush_tlb()
> > pass the value from the caller. Future changes will pass different
> > values than is_guest_mode(vcpu).
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> > ---
> >  arch/x86/kvm/hyperv.h  | 8 +++++---
> >  arch/x86/kvm/svm/svm.c | 2 +-
> >  arch/x86/kvm/x86.c     | 2 +-
> >  3 files changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> > index 913bfc96959cb..be715deaeb003 100644
> > --- a/arch/x86/kvm/hyperv.h
> > +++ b/arch/x86/kvm/hyperv.h
> > @@ -203,14 +203,15 @@ static inline struct kvm_vcpu_hv_tlb_flush_fifo *kvm_hv_get_tlb_flush_fifo(struc
> >  	return &hv_vcpu->tlb_flush_fifo[i];
> >  }
> >  
> > -static inline void kvm_hv_vcpu_purge_flush_tlb(struct kvm_vcpu *vcpu)
> > +static inline void kvm_hv_vcpu_purge_flush_tlb(struct kvm_vcpu *vcpu,
> > +					       bool is_guest_mode)

NAK, passing around is_guest_mode is going to cause problems.  All it takes is
one snippet of code that operates on the current vCPU state for KVM to end up
with bugs.  It's unfortunate that kvm_hv_get_tlb_flush_fifo() takes in an
@is_guest_mode param, but that's "necessary" due to the cross-vCPU nature of
the usage.  For this case, there is no such requirement/restriction.

I also think that being super explicit isn't a bad thing, even if it means we
might end up with duplicate code.  I.e. having this

	vmcb_set_flush_asid(svm->vmcb01.ptr);
	if (svm->nested.vmcb02.ptr)
		vmcb_set_flush_asid(svm->nested.vmcb02.ptr);

in svm_flush_tlb_all() is a net positive IMO, because it explicitly reads "flush
vmcb01's ASID, and vmcb02's ASID if vmcb02 is valid".  Whereas this

        svm_flush_tlb_asid(vcpu, false);
        svm_flush_tlb_asid(vcpu, true);

isn't anywhere near as explicit.  I can make a good guess as to what true/false
are specifying, but many readers will need to go at least a layer or two deeper
to understand what's going on.  More importantly, it's not at all clear in
svm_flush_tlb_asid() that the vmcb can/should only be NULL in the is_guest_mode=true
case.

        if (vmcb)
                vmcb_set_flush_asid(vmcb);

And it's even actively dangerous, in that a bug where a vmcb is unexpectedly NULL
could lead to a missed TLB flush.  I.e. we *want* a NULL pointer #GP in a case
like this, so that the host yells loudly (even if it means panicking), versus
silently doing nothing and potentially corrupting guest data.  In practice, I can't
imagine such a bug ever being truly silent, e.g. KVM is all but guaranteed to
consume the NULL vmcb sooner than later.  But I still don't like creating such a
possibility.

> >  {
> >  	struct kvm_vcpu_hv_tlb_flush_fifo *tlb_flush_fifo;
> >  
> >  	if (!to_hv_vcpu(vcpu) || !kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu))

Case in point, kvm_check_request() is destructive (the name sucks, but it is what
it is), i.e. KVM_REQ_HV_TLB_FLUSH will be cleared, and so only the first of the
calls to svm_flush_tlb_asid() and thus kvm_hv_vcpu_purge_flush_tlb() will actually
do anything.  This particular bug is functionally benign (KVM will over-flush),
but it's still a bug.

Somewhat of a side topic, I think we should rename kvm_hv_vcpu_purge_flush_tlb()
to something like kvm_hv_purge_tlb_flush_fifo().  I initially read the first one
as "purge *and* flush TLBs", whereas the function is actually "purge the TLB
flush FIFO".

Completely untested, but I think we should shoot for something like this, over
2 or 3 patches.

---
 arch/x86/kvm/hyperv.h     | 14 +++++++++++++-
 arch/x86/kvm/svm/nested.c |  1 -
 arch/x86/kvm/svm/svm.c    | 17 ++++++++---------
 3 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 913bfc96959c..f2c17459dd8b 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -203,7 +203,7 @@ static inline struct kvm_vcpu_hv_tlb_flush_fifo *kvm_hv_get_tlb_flush_fifo(struc
 	return &hv_vcpu->tlb_flush_fifo[i];
 }
 
-static inline void kvm_hv_vcpu_purge_flush_tlb(struct kvm_vcpu *vcpu)
+static inline void kvm_hv_purge_tlb_flush_fifo(struct kvm_vcpu *vcpu)
 {
 	struct kvm_vcpu_hv_tlb_flush_fifo *tlb_flush_fifo;
 
@@ -215,6 +215,18 @@ static inline void kvm_hv_vcpu_purge_flush_tlb(struct kvm_vcpu *vcpu)
 	kfifo_reset_out(&tlb_flush_fifo->entries);
 }
 
+static inline void kvm_hv_purge_tlb_flush_fifo(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+	int i;
+
+	if (!hv_vcpu || !kvm_check_request(KVM_REQ_HV_TLB_FLUSH, vcpu))
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(hv_vcpu->tlb_flush_fifo); i++)
+		kfifo_reset_out(&hv_vcpu->tlb_flush_fifo[i]->entries);
+}
+
 static inline bool guest_hv_cpuid_has_l2_tlb_flush(struct kvm_vcpu *vcpu)
 {
 	struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b6c27b34f8e5..7e9156f27a96 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -491,7 +491,6 @@ static void nested_svm_entry_tlb_flush(struct kvm_vcpu *vcpu)
 	 * TODO: optimize unconditional TLB flush/MMU sync.  A partial list of
 	 * things to fix before this can be conditional:
 	 *
-	 *  - Flush TLBs for both L1 and L2 remote TLB flush
 	 *  - Honor L1's request to flush an ASID on nested VMRUN
 	 *  - Sync nested NPT MMU on VMRUN that flushes L2's ASID[*]
 	 *  - Don't crush a pending TLB flush in vmcb02 on nested VMRUN
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 371593c4b629..f7be29733c9d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4163,15 +4163,8 @@ static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
 	 * A TLB flush for the current ASID flushes both "host" and "guest" TLB
 	 * entries, and thus is a superset of Hyper-V's fine grained flushing.
 	 */
-	kvm_hv_vcpu_purge_flush_tlb(vcpu);
+	kvm_hv_purge_tlb_flush_fifo(vcpu);
 
-	/*
-	 * Flush only the current ASID even if the TLB flush was invoked via
-	 * kvm_flush_remote_tlbs().  Although flushing remote TLBs requires all
-	 * ASIDs to be flushed, KVM uses a single ASID for L1 and L2, and
-	 * unconditionally does a TLB flush on both nested VM-Enter and nested
-	 * VM-Exit (via kvm_mmu_reset_context()).
-	 */
 	vmcb_set_flush_asid(svm->vmcb);
 }
 
@@ -4193,6 +4186,8 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
 
 static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+
 	/*
 	 * When running on Hyper-V with EnlightenedNptTlb enabled, remote TLB
 	 * flushes should be routed to hv_flush_remote_tlbs() without requesting
@@ -4203,7 +4198,11 @@ static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
 	if (WARN_ON_ONCE(svm_hv_is_enlightened_tlb_enabled(vcpu)))
 		hv_flush_remote_tlbs(vcpu->kvm);
 
-	svm_flush_tlb_asid(vcpu);
+	kvm_hv_vcpu_purge_flush_tlb_all(vcpu);
+
+	vmcb_set_flush_asid(svm->vmcb01.ptr);
+	if (svm->nested.vmcb02.ptr)
+		vmcb_set_flush_asid(svm->nested.vmcb02.ptr);
 }
 
 static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)

base-commit: ba550af5af66a83ad055519b2271f6a21f28cb1b
--

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 06/24] KVM: SEV: Track ASID->vCPU instead of ASID->VMCB
  2025-06-20 23:13   ` Sean Christopherson
@ 2025-06-23 19:50     ` Tom Lendacky
  2025-06-23 20:37       ` Sean Christopherson
  0 siblings, 1 reply; 58+ messages in thread
From: Tom Lendacky @ 2025-06-23 19:50 UTC (permalink / raw)
  To: Sean Christopherson, Yosry Ahmed
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, x86, kvm, linux-kernel

On 6/20/25 18:13, Sean Christopherson wrote:
> On Wed, Mar 26, 2025, Yosry Ahmed wrote:
>> SEV currently tracks the ASID to VMCB mapping for each physical CPU.
>> This is required to flush the ASID when a new VMCB using the same ASID
>> is run on the same CPU. Practically, there is a single VMCB for each
>> vCPU using SEV. Furthermore, TLB flushes on nested transitions between
>> VMCB01 and VMCB02 are handled separately (see
>> nested_svm_transition_tlb_flush()).
>>
>> In preparation for generalizing the tracking and making the tracking
>> more expensive, start tracking the ASID to vCPU mapping instead. This
>> will allow for the tracking to be moved to a cheaper code path when
>> vCPUs are switched.
>>
>> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
>> ---
>>  arch/x86/kvm/svm/sev.c | 12 ++++++------
>>  arch/x86/kvm/svm/svm.c |  2 +-
>>  arch/x86/kvm/svm/svm.h |  4 ++--
>>  3 files changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index d613f81addf1c..ddb4d5b211ed7 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -240,7 +240,7 @@ static void sev_asid_free(struct kvm_sev_info *sev)
>>  
>>  	for_each_possible_cpu(cpu) {
>>  		sd = per_cpu_ptr(&svm_data, cpu);
>> -		sd->sev_vmcbs[sev->asid] = NULL;
>> +		sd->sev_vcpus[sev->asid] = NULL;
>>  	}
>>  
>>  	mutex_unlock(&sev_bitmap_lock);
>> @@ -3081,8 +3081,8 @@ int sev_cpu_init(struct svm_cpu_data *sd)
>>  	if (!sev_enabled)
>>  		return 0;
>>  
>> -	sd->sev_vmcbs = kcalloc(nr_asids, sizeof(void *), GFP_KERNEL);
>> -	if (!sd->sev_vmcbs)
>> +	sd->sev_vcpus = kcalloc(nr_asids, sizeof(void *), GFP_KERNEL);
>> +	if (!sd->sev_vcpus)
>>  		return -ENOMEM;
>>  
>>  	return 0;
>> @@ -3471,14 +3471,14 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
>>  	/*
>>  	 * Flush guest TLB:
>>  	 *
>> -	 * 1) when different VMCB for the same ASID is to be run on the same host CPU.
>> +	 * 1) when different vCPU for the same ASID is to be run on the same host CPU.
> 
> Tom, can you clarity what an ASID actually tags when NPT is in use?

I ran your questions by David Kaplan and hopefully his responses will
helps clear things up.

> 
> The more I think about all of this, the less it makes sense.  The *entire* point
> of an ASID is to tag TLB entries so that a flush isn't required when running code
> for the same address space.
> 
> The main problem I'm struggling with is that, as usual, the APM doesn't properly
> document anything, and just gives "suggestions" for the VMM.  *sigh*
> 
> As I read it, these snippets from the APM are saying ASIDs tag only GPA=>PA entries
> when NPT is in use.
> 
>   TLB entries are tagged with Address Space Identifier (ASID) bits to distinguish
>   different guest virtual address spaces when shadow page tables are used, or
>   different guest physical address spaces when nested page tables are used. The
>   VMM can choose a software strategy in which it keeps multiple shadow page tables,
>   and/or multiple nested page tables in processors that support nested paging,
>   up-to-date; the VMM can allocate a different ASID for each shadow or nested
>   page table. This allows switching to a new process in a guest under shadow
>   paging (changing CR3 contents), or to a new guest under nested paging (changing
>   nCR3 contents), without flushing the TLBs.
> 
>   Note that because an ASID is associated with the guest's physical address
>   space, it is common across all of the guest's virtual address spaces within a
>   processor. This differs from shadow page tables where ASIDs tag individual
>   guest virtual address spaces. Note also that the same ASID may or may not be
>   associated with the same address space across all processors in a
>   multiprocessor system, for either nested tables or shadow tables; this depends
>   on how the VMM manages ASID assignment.
> 
> But then the "15.16.1 TLB Flush" section says this, without any qualification
> whatsoever that it applies only to shadow paging.
> 
>   A MOV-to-CR3, a task switch that changes CR3, or clearing or setting CR0.PG or
>   bits PGE, PAE, PSE of CR4 affects only the TLB entries belonging to the current
>   ASID, regardless of whether the operation occurred in host or guest mode. The
>   current ASID is 0 when the CPU is not inside a guest context.
> 
> And honestly, tagging only GPA=>PA entries doesn't make any sense, because
> GVA=>GPA needs to be tagged with *something*.  And the APM doesn't say anything
> about caching GPA=>PA translations, only about caching VA=>PA.

VA=>PA translations are always tagged with a TLB tag value.  Outside of
SEV-SNP, the TLB tag value is ASID.

So for those guests, VA=>PA translation are tagged with the ASID.  For
SEV-SNP guests, see below.

> 
> The thing that doesn't fit is that SEV+ uses ASIDs on a per-VM basis.  I suggested
> per-VM ASIDs for all VM types based solely on that fact, but now I'm wondering if
> it's SEV+ that crazy and broken.  Because if ASIDs also tag GVA=>GPA, then SEV has
> a massive architectural security hole, e.g. a malicious hypervisor can coerce the
> CPU into using a stale GVA=>GPA TLB entry by switching vCPUs and letting guest
> process with CR3=x access memory for guest process with CR3=y.  But again, if
> ASIDs don't tag GVA=>GPA, then what provides isolation between vCPUs!?!?!

No.

For SEV/SEV-ES guests, the HV (which remains partially trusted) must do a
TLB flush before running a different VMCB of the same guest, in order to
avoid this problem. This code is in pre_sev_run().

For SEV-SNP guests, this is handled automatically by hardware through the
PCPU_ID and TLB_ID VMSA fields (documented somewhat in APM 15.36.15).

In short, the TLB is tagged with {TLB_ID, ASID} and TLB_ID is managed by
HW and guaranteed to be different for each vCPU of the guest running on a
physical core. This ensures that the TLB tag is unique for each guest and
for each vCPU of the guest.

> 
> Assuming ASIDs tag VA=>PA (i.e. the combined GVA=>GPA=>PA translation), then I
> take back my suggestion to use per-VM ASIDs, and instead propose we treat ASIDs
> like VPIDs, i.e. use per-vCPU ASIDs (well, technically per-VMCB, because nested).
> All server SKUs I've checked (Rome, Milan, Genoa, and Turin) support 0x8000 ASIDs.
> That makes the ~512 ASIDs reserved for SEV+ a drop in the bucket, i.e. still leaves
> ~32k ASIDs up for grabs, which means KVM can concurrently run ~16k vCPUs *with*
> nested VMs without having to fallback to flushing when scheduling in a new vCPU.
> 
> That way we don't need the complexity of the xarray ASID=>vCPU tracking for common
> code.  And as a bonus, the logic for VMX vs. SVM is very, very similar.
> 
> Given the SEV+ uses the ASID as the handle for the guest's encryption key,
> changing that behavior isn't an option.  Though I still don't see how that isn't
> a security flaw.  Regardless, I definitely don't think it's something we should
> follow.

I don't object to the above, there are plenty of ASIDs. Per-VMCB ASIDs
seems fine. I suspect that the per-pCPU scoping of ASIDs likely dates back
to the era when there weren't many ASIDs. But now that everything supports
32k, that's not an issue.

Note that SEV reserves 1006 ASIDs in Genoa/later, not 512.

Thanks,
Tom


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 06/24] KVM: SEV: Track ASID->vCPU instead of ASID->VMCB
  2025-06-23 19:50     ` Tom Lendacky
@ 2025-06-23 20:37       ` Sean Christopherson
  0 siblings, 0 replies; 58+ messages in thread
From: Sean Christopherson @ 2025-06-23 20:37 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Yosry Ahmed, Paolo Bonzini, Jim Mattson, Maxim Levitsky,
	Vitaly Kuznetsov, Rik van Riel, x86, kvm, linux-kernel

On Mon, Jun 23, 2025, Tom Lendacky wrote:
> On 6/20/25 18:13, Sean Christopherson wrote:
> > On Wed, Mar 26, 2025, Yosry Ahmed wrote:
> > The more I think about all of this, the less it makes sense.  The *entire* point
> > of an ASID is to tag TLB entries so that a flush isn't required when running code
> > for the same address space.
> > 
> > The main problem I'm struggling with is that, as usual, the APM doesn't properly
> > document anything, and just gives "suggestions" for the VMM.  *sigh*
> > 
> > As I read it, these snippets from the APM are saying ASIDs tag only GPA=>PA entries
> > when NPT is in use.
> > 
> >   TLB entries are tagged with Address Space Identifier (ASID) bits to distinguish
> >   different guest virtual address spaces when shadow page tables are used, or
> >   different guest physical address spaces when nested page tables are used. The
> >   VMM can choose a software strategy in which it keeps multiple shadow page tables,
> >   and/or multiple nested page tables in processors that support nested paging,
> >   up-to-date; the VMM can allocate a different ASID for each shadow or nested
> >   page table. This allows switching to a new process in a guest under shadow
> >   paging (changing CR3 contents), or to a new guest under nested paging (changing
> >   nCR3 contents), without flushing the TLBs.
> > 
> >   Note that because an ASID is associated with the guest's physical address
> >   space, it is common across all of the guest's virtual address spaces within a
> >   processor. This differs from shadow page tables where ASIDs tag individual
> >   guest virtual address spaces. Note also that the same ASID may or may not be
> >   associated with the same address space across all processors in a
> >   multiprocessor system, for either nested tables or shadow tables; this depends
> >   on how the VMM manages ASID assignment.
> > 
> > But then the "15.16.1 TLB Flush" section says this, without any qualification
> > whatsoever that it applies only to shadow paging.
> > 
> >   A MOV-to-CR3, a task switch that changes CR3, or clearing or setting CR0.PG or
> >   bits PGE, PAE, PSE of CR4 affects only the TLB entries belonging to the current
> >   ASID, regardless of whether the operation occurred in host or guest mode. The
> >   current ASID is 0 when the CPU is not inside a guest context.
> > 
> > And honestly, tagging only GPA=>PA entries doesn't make any sense, because
> > GVA=>GPA needs to be tagged with *something*.  And the APM doesn't say anything
> > about caching GPA=>PA translations, only about caching VA=>PA.
> 
> VA=>PA translations are always tagged with a TLB tag value.  Outside of
> SEV-SNP, the TLB tag value is ASID.
> 
> So for those guests, VA=>PA translation are tagged with the ASID.  For
> SEV-SNP guests, see below.
> 
> > 
> > The thing that doesn't fit is that SEV+ uses ASIDs on a per-VM basis.  I suggested
> > per-VM ASIDs for all VM types based solely on that fact, but now I'm wondering if
> > it's SEV+ that crazy and broken.  Because if ASIDs also tag GVA=>GPA, then SEV has
> > a massive architectural security hole, e.g. a malicious hypervisor can coerce the
> > CPU into using a stale GVA=>GPA TLB entry by switching vCPUs and letting guest
> > process with CR3=x access memory for guest process with CR3=y.  But again, if
> > ASIDs don't tag GVA=>GPA, then what provides isolation between vCPUs!?!?!
> 
> No.
> 
> For SEV/SEV-ES guests, the HV (which remains partially trusted) must do a
> TLB flush before running a different VMCB of the same guest, in order to
> avoid this problem. This code is in pre_sev_run().
> 
> For SEV-SNP guests, this is handled automatically by hardware through the
> PCPU_ID and TLB_ID VMSA fields (documented somewhat in APM 15.36.15).

Aha!  I knew I had to be missing something.  Rule #1: don't doubt Kaplan ;-)

> In short, the TLB is tagged with {TLB_ID, ASID} and TLB_ID is managed by
> HW and guaranteed to be different for each vCPU of the guest running on a
> physical core. This ensures that the TLB tag is unique for each guest and
> for each vCPU of the guest.

Thanks Tom, very much appreciated!

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC PATCH 22/24] KVM: nSVM: Handle INVLPGA interception correctly
  2025-03-26 19:44   ` [RFC PATCH 22/24] KVM: nSVM: Handle INVLPGA interception correctly Yosry Ahmed
  2025-04-03 20:10     ` Maxim Levitsky
@ 2025-06-24  1:08     ` Sean Christopherson
  1 sibling, 0 replies; 58+ messages in thread
From: Sean Christopherson @ 2025-06-24  1:08 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Paolo Bonzini, Jim Mattson, Maxim Levitsky, Vitaly Kuznetsov,
	Rik van Riel, Tom Lendacky, x86, kvm, linux-kernel

On Wed, Mar 26, 2025, Yosry Ahmed wrote:
> Currently, INVPLGA interception handles it like INVLPG, which flushes
> L1's TLB translations for the address. It was implemented in this way
> because L1 and L2 shared an ASID. Now, L1 and L2 have separate ASIDs. It
> is still harmless to flush L1's translations, but it's only correct
> because all translations are flushed on nested transitions anyway.
> 
> In preparation for stopping unconditional flushes on nested transitions,
> handle INVPLGA interception properly. If L1 specified zero as the ASID,
> this is equivalent to INVLPG, so handle it as such. Otherwise, use
> INVPLGA to flush the translations of the appropriate ASID tracked by
> KVM, if any. Sync the shadow MMU as well, as L1 invalidated L2's
> mappings.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/mmu/mmu.c          |  5 +++--
>  arch/x86/kvm/svm/svm.c          | 36 +++++++++++++++++++++++++++++++--
>  3 files changed, 39 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index d881e7d276b12..a158d324168a0 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2237,6 +2237,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
>  		       void *insn, int insn_len);
>  void kvm_mmu_print_sptes(struct kvm_vcpu *vcpu, gpa_t gpa, const char *msg);
>  void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
> +void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> +			       u64 addr, unsigned long roots, bool gva_flush);
>  void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  			     u64 addr, unsigned long roots);
>  void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e2b1994f12753..d3baa12df84e7 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6355,8 +6355,8 @@ static void kvm_mmu_invalidate_addr_in_root(struct kvm_vcpu *vcpu,
>  	write_unlock(&vcpu->kvm->mmu_lock);
>  }
>  
> -static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> -				      u64 addr, unsigned long roots, bool gva_flush)
> +void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> +			       u64 addr, unsigned long roots, bool gva_flush)

I don't love passing a boolean to avoid a flush.  I especially don't like it in
this case because vmx_flush_tlb_gva() has similar logic.  Unfortunately, I don't
see a better option at this point. :-/

If we do keep the param, it needs to be something like @flush_gva, because I
read @gva_flush as "this is a gva flush", and got all kinds of confused when
reading the code.

>  {
>  	int i;
>  
> @@ -6382,6 +6382,7 @@ static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu
>  			kvm_mmu_invalidate_addr_in_root(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
>  	}
>  }
> +EXPORT_SYMBOL_GPL(__kvm_mmu_invalidate_addr);
>  
>  void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
>  			     u64 addr, unsigned long roots)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 3649707c61d3e..4b95fd6b501e6 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2505,6 +2505,7 @@ static int clgi_interception(struct kvm_vcpu *vcpu)
>  
>  static int invlpga_interception(struct kvm_vcpu *vcpu)
>  {
> +	struct vcpu_svm *svm = to_svm(vcpu);
>  	gva_t gva = kvm_rax_read(vcpu);
>  	u32 asid = kvm_rcx_read(vcpu);
>  
> @@ -2514,8 +2515,39 @@ static int invlpga_interception(struct kvm_vcpu *vcpu)
>  
>  	trace_kvm_invlpga(to_svm(vcpu)->vmcb->save.rip, asid, gva);
>  
> -	/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
> -	kvm_mmu_invlpg(vcpu, gva);

This code needs to do a noncanonical check (assuming we can't figure out a way
to shoehorn this into kvm_mmu_invlpg()).  Consuming gva here for the asid != 0
case might be "fine", because INVLPGA won't fault, but it's still a bug, e.g. I
don't know what will happen when KVM tries to synchronize MMUs.

Another reason I don't love the @flush_gva param :-/

> +	/*
> +	 * APM is silent about using INVLPGA to flush the host ASID (i.e. 0).
> +	 * Do the logical thing and handle it like INVLPG.
> +	 */
> +	if (asid == 0) {

	if (!asid)

> +		kvm_mmu_invlpg(vcpu, gva);
> +		return kvm_skip_emulated_instruction(vcpu);
> +	}
> +
> +	/*
> +	 * Check if L1 specified the L2 ASID we are currently tracking. If it
> +	 * isn't, do nothing as we have to handle the TLB flush when switching
> +	 * to the new ASID anyway.
> +	 */

Please avoid pronoouns.  And try not to allude to behavior; the above doesn't
actually say what happens when switching to a new ASID, only that "we have to
handle the TLB flush".  E.g.

	/*
	 * Flush hardware TLB entries only if L1 is flushing KVM's currently
	 * tracked L2 ASID.  KVM does a full TLB flush when L1 runs a VMCB with
	 * a different L2 ASID.
	 */
 
> +	if (asid == svm->nested.last_asid)
> +		invlpga(gva, svm_nested_asid(vcpu->kvm));
> +
> +	/*
> +	 * If NPT is disabled, sync the shadow page tables as L1 is invalidating
> +	 * mappings for L2. Sync all roots as ASIDs are not tracked in the MMU
> +	 * role.
> +	 *
> +	 * As we are not flushing the current context, skip the gva flush from
> +	 * __kvm_mmu_invalidate_addr(), it would flush the wrong ASID anyway.
> +	 * The correct TLB flush was done above (if needed).
> +	 *
> +	 * This always operates on root_mmu because L1 and L2 share an MMU when
> +	 * NPT is disabled. This can be optimized by invalidating guest roots
> +	 * only.

Heh, I had a comment typed up about only need to sync guest roots, and then I
read this literal comment. :-)

> +	 */
> +	if (!npt_enabled)
> +		__kvm_mmu_invalidate_addr(vcpu, &vcpu->arch.root_mmu, gva,
> +					  KVM_MMU_ROOTS_ALL, false);
>  
>  	return kvm_skip_emulated_instruction(vcpu);
>  }
> -- 
> 2.49.0.395.g12beb8f557-goog
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2025-06-24  1:08 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
2025-03-26 19:35 ` [RFC PATCH 01/24] KVM: VMX: Generalize VPID allocation to be vendor-neutral Yosry Ahmed
2025-03-27 10:58   ` Nikunj A Dadhania
2025-03-27 17:13     ` Yosry Ahmed
2025-03-27 19:42       ` Sean Christopherson
2025-06-23 16:44   ` Sean Christopherson
2025-03-26 19:35 ` [RFC PATCH 02/24] KVM: SVM: Use cached local variable in init_vmcb() Yosry Ahmed
2025-04-03 19:56   ` Maxim Levitsky
2025-03-26 19:35 ` [RFC PATCH 03/24] KVM: SVM: Add helpers to set/clear ASID flush in VMCB Yosry Ahmed
2025-04-03 20:00   ` Maxim Levitsky
2025-06-23 16:46   ` Sean Christopherson
2025-03-26 19:35 ` [RFC PATCH 04/24] KVM: SVM: Flush everything if FLUSHBYASID is not available Yosry Ahmed
2025-04-03 20:00   ` Maxim Levitsky
2025-03-26 19:36 ` [RFC PATCH 05/24] KVM: SVM: Flush the ASID when running on a new CPU Yosry Ahmed
2025-04-03 20:00   ` Maxim Levitsky
2025-03-26 19:36 ` [RFC PATCH 06/24] KVM: SEV: Track ASID->vCPU instead of ASID->VMCB Yosry Ahmed
2025-04-03 20:04   ` Maxim Levitsky
2025-04-22  9:41     ` Yosry Ahmed
2025-06-20 23:13   ` Sean Christopherson
2025-06-23 19:50     ` Tom Lendacky
2025-06-23 20:37       ` Sean Christopherson
2025-03-26 19:36 ` [RFC PATCH 07/24] KVM: SEV: Track ASID->vCPU on vCPU load Yosry Ahmed
2025-04-03 20:04   ` Maxim Levitsky
2025-03-26 19:36 ` [RFC PATCH 08/24] KVM: SEV: Drop pre_sev_run() Yosry Ahmed
2025-04-03 20:04   ` Maxim Levitsky
2025-03-26 19:36 ` [RFC PATCH 09/24] KVM: SEV: Generalize tracking ASID->vCPU with xarrays Yosry Ahmed
2025-04-03 20:05   ` Maxim Levitsky
2025-04-22  9:50     ` Yosry Ahmed
2025-03-26 19:36 ` [RFC PATCH 10/24] KVM: SVM: Use a single ASID per VM Yosry Ahmed
2025-04-03 20:05   ` Maxim Levitsky
2025-04-22  9:51     ` Yosry Ahmed
2025-03-26 19:36 ` [RFC PATCH 11/24] KVM: nSVM: Use a separate ASID for nested guests Yosry Ahmed
2025-04-03 20:09   ` Maxim Levitsky
2025-04-22 10:08     ` Yosry Ahmed
2025-03-26 19:36 ` [RFC PATCH 12/24] KVM: x86: hyper-v: Pass is_guest_mode to kvm_hv_vcpu_purge_flush_tlb() Yosry Ahmed
2025-04-03 20:09   ` Maxim Levitsky
2025-06-23 19:22     ` Sean Christopherson
2025-03-26 19:36 ` [RFC PATCH 13/24] KVM: nSVM: Parameterize svm_flush_tlb_asid() by is_guest_mode Yosry Ahmed
2025-04-03 20:10   ` Maxim Levitsky
2025-04-22 10:04     ` Yosry Ahmed
2025-03-26 19:36 ` [RFC PATCH 14/24] KVM: nSVM: Split nested_svm_transition_tlb_flush() into entry/exit fns Yosry Ahmed
2025-03-26 19:36 ` [RFC PATCH 15/24] KVM: x86/mmu: rename __kvm_mmu_invalidate_addr() Yosry Ahmed
2025-04-03 20:10   ` Maxim Levitsky
2025-03-26 19:36 ` [RFC PATCH 16/24] KVM: x86/mmu: Allow skipping the gva flush in kvm_mmu_invalidate_addr() Yosry Ahmed
2025-04-03 20:10   ` Maxim Levitsky
2025-03-26 19:36 ` [RFC PATCH 17/24] KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH Yosry Ahmed
2025-04-03 20:10   ` Maxim Levitsky
2025-03-26 19:41 ` [RFC PATCH 18/24] KVM: nSVM: Handle nested TLB flush requests through TLB_CONTROL Yosry Ahmed
2025-03-26 19:43 ` [RFC PATCH 19/24] KVM: nSVM: Flush the TLB if L1 changes L2's ASID Yosry Ahmed
2025-03-26 19:44 ` [RFC PATCH 20/24] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry Yosry Ahmed
2025-03-26 19:44   ` [RFC PATCH 21/24] KVM: nSVM: Service local TLB flushes before nested transitions Yosry Ahmed
2025-03-26 19:44   ` [RFC PATCH 22/24] KVM: nSVM: Handle INVLPGA interception correctly Yosry Ahmed
2025-04-03 20:10     ` Maxim Levitsky
2025-06-24  1:08     ` Sean Christopherson
2025-03-26 19:44   ` [RFC PATCH 23/24] KVM: nSVM: Allocate a new ASID for nested guests Yosry Ahmed
2025-04-03 20:11     ` Maxim Levitsky
2025-04-22 10:01       ` Yosry Ahmed
2025-03-26 19:44   ` [RFC PATCH 24/24] KVM: nSVM: Stop bombing the TLB on nested transitions Yosry Ahmed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).