kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Jim Mattson <jmattson@google.com>,
	 Maxim Levitsky <mlevitsk@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	 Rik van Riel <riel@surriel.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	x86@kernel.org,  kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 01/24] KVM: VMX: Generalize VPID allocation to be vendor-neutral
Date: Mon, 23 Jun 2025 09:44:51 -0700	[thread overview]
Message-ID: <aFmEgxB7EWvEOixP@google.com> (raw)
In-Reply-To: <20250326193619.3714986-2-yosry.ahmed@linux.dev>

On Wed, Mar 26, 2025, Yosry Ahmed wrote:
> Generalize the VMX VPID allocation code and make move it to common code
> in preparation for sharing with SVM. Create a generic struct
> kvm_tlb_tags, representing a factory for VPIDs (or ASIDs later), and use
> one for VPIDs.

I don't see any reason in creating a factory, just have common KVM provide the
structure.


> Most of the functionality remains the same, with the following
> differences:
> - The enable_vpid checks are moved to the callers for allocate_vpid()
>   and free_vpid(), as they are specific to VMX.
> - The bitmap allocation is now dynamic (which will be required for SVM),
>   so it is initialized and cleaned up in vmx_hardware_{setup/unsetup}().
> - The range of valid TLB tags is expressed in terms of min/max instead
>   of the number of tags to support SVM use cases.
> 
> Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> ---
>  arch/x86/kvm/vmx/nested.c |  4 +--
>  arch/x86/kvm/vmx/vmx.c    | 38 +++++--------------------
>  arch/x86/kvm/vmx/vmx.h    |  4 +--
>  arch/x86/kvm/x86.c        | 58 +++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.h        | 13 +++++++++
>  5 files changed, 82 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index d06e50d9c0e79..b017bd2eb2382 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -343,7 +343,7 @@ static void free_nested(struct kvm_vcpu *vcpu)
>  	vmx->nested.vmxon = false;
>  	vmx->nested.smm.vmxon = false;
>  	vmx->nested.vmxon_ptr = INVALID_GPA;
> -	free_vpid(vmx->nested.vpid02);
> +	kvm_tlb_tags_free(&vmx_vpids, vmx->nested.vpid02);
>  	vmx->nested.posted_intr_nv = -1;
>  	vmx->nested.current_vmptr = INVALID_GPA;
>  	if (enable_shadow_vmcs) {
> @@ -5333,7 +5333,7 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu)
>  		     HRTIMER_MODE_ABS_PINNED);
>  	vmx->nested.preemption_timer.function = vmx_preemption_timer_fn;
>  
> -	vmx->nested.vpid02 = allocate_vpid();
> +	vmx->nested.vpid02 = enable_vpid ? kvm_tlb_tags_alloc(&vmx_vpids) : 0;

Since the tag allocator already needs to handle "no tag available", it should also
handle the "tagging" not enabled scenario.  That way this can simply be:

	vmx->nested.vpid02 = kvm_tlb_tags_alloc(&vmx_vpids);

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 69c20a68a3f01..182f18ebc62f3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -13992,6 +13992,64 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
>  }
>  EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
>  
> +int kvm_tlb_tags_init(struct kvm_tlb_tags *tlb_tags, unsigned int min,
> +		      unsigned int max)

I'd much prefer we don't create a "kvm_tlb_tags" namespace, and insteaad go with:

  kvm_init_tlb_tags()
  kvm_alloc_tlb_tag()
  kvm_free_tlb_tag()

Because kvm_tlb_tags_alloc() in particular reads like it allocates *multiple*
tags.

I also think it's probably worth a typedef for the tag, mostly to help with
understanding what's being pased around, e.g.

typedef unsigned int kvm_tlb_tag_t;

void kvm_init_tlb_tags(kvm_tlb_tag_t min, kvm_tlb_tag_t max);
kvm_tlb_tag_t kvm_alloc_tlb_tag(void);
void kvm_free_tlb_tag(kvm_tlb_tag_t tag);

> +{
> +	/*
> +	 * 0 is assumed to be the host's TLB tag and is returned on failed

Not assumed, *is*.

> +	 * allocations.
> +	 */
> +	if (WARN_ON_ONCE(min == 0))
> +		return -1;
> +
> +	/*
> +	 * Allocate enough bits to index the bitmap directly by the tag,
> +	 * potentially wasting a bit of memory.
> +	 */
> +	tlb_tags->bitmap = bitmap_zalloc(max + 1, GFP_KERNEL);

Rather than blindly allocate SVM's theoretical max of 4 *billion* tags, I think
we should statically reserve space for 65k tags, i.e. for the max possible VMX
VPID.

As mentioned in a different reply, current AMD CPUs support 32k ASIDs, so in
practice it's not even a meaningful limit.  And I strongly suspect that pushing
past ~4k active vCPUs, let alone 32k vCPUs, will run into other bottlenecks long
before the number of ASIDs becomes problematic.

That way KVM doesn't need to bail on failure, or as is done for VMX, silently
disable VPID usage.

Untested, but this is what I'm thinking:

---
 arch/x86/kvm/mmu.h        |  6 ++++
 arch/x86/kvm/mmu/mmu.c    | 61 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/nested.c |  4 +--
 arch/x86/kvm/vmx/vmx.c    | 38 +++++-------------------
 arch/x86/kvm/vmx/vmx.h    |  2 --
 5 files changed, 76 insertions(+), 35 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index b4b6860ab971..9e7343722530 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -78,6 +78,12 @@ static inline gfn_t kvm_mmu_max_gfn(void)
 
 u8 kvm_mmu_get_max_tdp_level(void);
 
+typedef unsigned int kvm_tlb_tag_t;
+
+void kvm_init_tlb_tags(kvm_tlb_tag_t min, kvm_tlb_tag_t max);
+kvm_tlb_tag_t kvm_alloc_tlb_tag(void);
+void kvm_free_tlb_tag(kvm_tlb_tag_t tag);
+
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask);
 void kvm_mmu_set_mmio_spte_value(struct kvm *kvm, u64 mmio_value);
 void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4e06e2e89a8f..e58d998ed10a 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -121,6 +121,63 @@ static int max_tdp_level __read_mostly;
 
 #include <trace/events/kvm.h>
 
+#define KVM_MAX_TLB_TAG			0xffff
+
+struct kvm_tlb_tags {
+	spinlock_t	lock;
+	DECLARE_BITMAP(used, KVM_MAX_TLB_TAG + 1);
+	kvm_tlb_tag_t	min;
+	kvm_tlb_tag_t	max;
+};
+
+struct kvm_tlb_tags kvm_tlb_tags;
+
+void kvm_init_tlb_tags(kvm_tlb_tag_t min, kvm_tlb_tag_t max)
+{
+	/*
+	 * 0 is the host's TLB tag for both VMX's VPID and SVM's ASID, and is
+	 * returned on failed allocations, e.g. if there are no more tags left.
+	 */
+	if (WARN_ON_ONCE(!min || max < min))
+		return;
+
+	kvm_tlb_tags.min = min;
+	kvm_tlb_tags.max = min(max, KVM_MAX_TLB_TAG);
+}
+EXPORT_SYMBOL_GPL(kvm_init_tlb_tags);
+
+kvm_tlb_tag_t kvm_alloc_tlb_tag(void)
+{
+	struct kvm_tlb_tags *tags = &kvm_tlb_tags;
+	kvm_tlb_tag_t tag;
+
+	if (!kvm_tlb_tags.min)
+		return 0;
+
+	guard(spinlock)(&kvm_tlb_tags.lock);
+
+	tag = find_next_zero_bit(tags->used, tags->max + 1, tags->min);
+	if (tag > tags->max)
+		return 0;
+
+	__set_bit(tag, tags->used);
+	return tag;
+}
+EXPORT_SYMBOL_GPL(kvm_alloc_tlb_tag);
+
+void kvm_free_tlb_tag(kvm_tlb_tag_t tag)
+{
+	struct kvm_tlb_tags *tags = &kvm_tlb_tags;
+
+	if (!tag || WARN_ON_ONCE(tag < tags->min || tag > tags->max))
+		return;
+
+	guard(spinlock)(&tags->lock);
+
+	__clear_bit(tag, tags->used);
+}
+EXPORT_SYMBOL_GPL(kvm_free_tlb_tag);
+
 /* make pte_list_desc fit well in cache lines */
 #define PTE_LIST_EXT 14
 
@@ -7426,6 +7483,10 @@ int kvm_mmu_vendor_module_init(void)
 
 	kvm_mmu_reset_all_pte_masks();
 
+	kvm_tlb_tags.min = 0;
+	kvm_tlb_tags.max = 0;
+	bitmap_zero(kvm_tlb_tags.used, KVM_MAX_TLB_TAG + 1);
+
 	pte_list_desc_cache = KMEM_CACHE(pte_list_desc, SLAB_ACCOUNT);
 	if (!pte_list_desc_cache)
 		goto out;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7211c71d4241..7f02dbe196e3 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -344,7 +344,7 @@ static void free_nested(struct kvm_vcpu *vcpu)
 	vmx->nested.vmxon = false;
 	vmx->nested.smm.vmxon = false;
 	vmx->nested.vmxon_ptr = INVALID_GPA;
-	free_vpid(vmx->nested.vpid02);
+	kvm_tlb_tags_free(&vmx_vpids, vmx->nested.vpid02);
 	vmx->nested.posted_intr_nv = -1;
 	vmx->nested.current_vmptr = INVALID_GPA;
 	if (enable_shadow_vmcs) {
@@ -5333,7 +5333,7 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu)
 	hrtimer_setup(&vmx->nested.preemption_timer, vmx_preemption_timer_fn, CLOCK_MONOTONIC,
 		      HRTIMER_MODE_ABS_PINNED);
 
-	vmx->nested.vpid02 = allocate_vpid();
+	vmx->nested.vpid02 = kvm_alloc_tlb_tag();
 
 	vmx->nested.vmcs02_initialized = false;
 	vmx->nested.vmxon = true;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4953846cb30d..4f3d78e71461 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -501,8 +501,7 @@ DEFINE_PER_CPU(struct vmcs *, current_vmcs);
  */
 static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
 
-static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
-static DEFINE_SPINLOCK(vmx_vpid_lock);
+struct kvm_tlb_tags vmx_vpids;
 
 struct vmcs_config vmcs_config __ro_after_init;
 struct vmx_capability vmx_capability __ro_after_init;
@@ -3970,31 +3969,6 @@ static void seg_setup(int seg)
 	vmcs_write32(sf->ar_bytes, ar);
 }
 
-int allocate_vpid(void)
-{
-	int vpid;
-
-	if (!enable_vpid)
-		return 0;
-	spin_lock(&vmx_vpid_lock);
-	vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
-	if (vpid < VMX_NR_VPIDS)
-		__set_bit(vpid, vmx_vpid_bitmap);
-	else
-		vpid = 0;
-	spin_unlock(&vmx_vpid_lock);
-	return vpid;
-}
-
-void free_vpid(int vpid)
-{
-	if (!enable_vpid || vpid == 0)
-		return;
-	spin_lock(&vmx_vpid_lock);
-	__clear_bit(vpid, vmx_vpid_bitmap);
-	spin_unlock(&vmx_vpid_lock);
-}
-
 static void vmx_msr_bitmap_l01_changed(struct vcpu_vmx *vmx)
 {
 	/*
@@ -7480,7 +7454,7 @@ void vmx_vcpu_free(struct kvm_vcpu *vcpu)
 
 	if (enable_pml)
 		vmx_destroy_pml_buffer(vmx);
-	free_vpid(vmx->vpid);
+	kvm_tlb_tags_free(&vmx_vpids, vmx->vpid);
 	nested_vmx_free_vcpu(vcpu);
 	free_loaded_vmcs(vmx->loaded_vmcs);
 	free_page((unsigned long)vmx->ve_info);
@@ -7499,7 +7473,7 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 
 	err = -ENOMEM;
 
-	vmx->vpid = allocate_vpid();
+	vmx->vpid = kvm_alloc_tlb_tag();
 
 	/*
 	 * If PML is turned on, failure on enabling PML just results in failure
@@ -7602,7 +7576,7 @@ int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 free_pml:
 	vmx_destroy_pml_buffer(vmx);
 free_vpid:
-	free_vpid(vmx->vpid);
+	kvm_tlb_tags_free(&vmx_vpids, vmx->vpid);
 	return err;
 }
 
@@ -8522,7 +8496,9 @@ __init int vmx_hardware_setup(void)
 	kvm_caps.has_bus_lock_exit = cpu_has_vmx_bus_lock_detection();
 	kvm_caps.has_notify_vmexit = cpu_has_notify_vmexit();
 
-	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
+	/* VPID 0 is reserved for host, so min=1  */
+	if (enable_vpid)
+		kvm_init_tlb_tags(1, VMX_NR_VPIDS - 1);
 
 	if (enable_ept)
 		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index b5758c33c60f..5feec05de9b4 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -355,8 +355,6 @@ static __always_inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu)
 }
 
 void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu);
-int allocate_vpid(void);
-void free_vpid(int vpid);
 void vmx_set_constant_host_state(struct vcpu_vmx *vmx);
 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,

base-commit: ecff148f29dade8416abee4d492d2a7a6d7cd610
--

  parent reply	other threads:[~2025-06-23 16:44 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-26 19:35 [RFC PATCH 00/24] KVM: SVM: Rework ASID management Yosry Ahmed
2025-03-26 19:35 ` [RFC PATCH 01/24] KVM: VMX: Generalize VPID allocation to be vendor-neutral Yosry Ahmed
2025-03-27 10:58   ` Nikunj A Dadhania
2025-03-27 17:13     ` Yosry Ahmed
2025-03-27 19:42       ` Sean Christopherson
2025-06-23 16:44   ` Sean Christopherson [this message]
2025-03-26 19:35 ` [RFC PATCH 02/24] KVM: SVM: Use cached local variable in init_vmcb() Yosry Ahmed
2025-04-03 19:56   ` Maxim Levitsky
2025-03-26 19:35 ` [RFC PATCH 03/24] KVM: SVM: Add helpers to set/clear ASID flush in VMCB Yosry Ahmed
2025-04-03 20:00   ` Maxim Levitsky
2025-06-23 16:46   ` Sean Christopherson
2025-03-26 19:35 ` [RFC PATCH 04/24] KVM: SVM: Flush everything if FLUSHBYASID is not available Yosry Ahmed
2025-04-03 20:00   ` Maxim Levitsky
2025-03-26 19:36 ` [RFC PATCH 05/24] KVM: SVM: Flush the ASID when running on a new CPU Yosry Ahmed
2025-04-03 20:00   ` Maxim Levitsky
2025-03-26 19:36 ` [RFC PATCH 06/24] KVM: SEV: Track ASID->vCPU instead of ASID->VMCB Yosry Ahmed
2025-04-03 20:04   ` Maxim Levitsky
2025-04-22  9:41     ` Yosry Ahmed
2025-06-20 23:13   ` Sean Christopherson
2025-06-23 19:50     ` Tom Lendacky
2025-06-23 20:37       ` Sean Christopherson
2025-03-26 19:36 ` [RFC PATCH 07/24] KVM: SEV: Track ASID->vCPU on vCPU load Yosry Ahmed
2025-04-03 20:04   ` Maxim Levitsky
2025-03-26 19:36 ` [RFC PATCH 08/24] KVM: SEV: Drop pre_sev_run() Yosry Ahmed
2025-04-03 20:04   ` Maxim Levitsky
2025-03-26 19:36 ` [RFC PATCH 09/24] KVM: SEV: Generalize tracking ASID->vCPU with xarrays Yosry Ahmed
2025-04-03 20:05   ` Maxim Levitsky
2025-04-22  9:50     ` Yosry Ahmed
2025-03-26 19:36 ` [RFC PATCH 10/24] KVM: SVM: Use a single ASID per VM Yosry Ahmed
2025-04-03 20:05   ` Maxim Levitsky
2025-04-22  9:51     ` Yosry Ahmed
2025-03-26 19:36 ` [RFC PATCH 11/24] KVM: nSVM: Use a separate ASID for nested guests Yosry Ahmed
2025-04-03 20:09   ` Maxim Levitsky
2025-04-22 10:08     ` Yosry Ahmed
2025-03-26 19:36 ` [RFC PATCH 12/24] KVM: x86: hyper-v: Pass is_guest_mode to kvm_hv_vcpu_purge_flush_tlb() Yosry Ahmed
2025-04-03 20:09   ` Maxim Levitsky
2025-06-23 19:22     ` Sean Christopherson
2025-03-26 19:36 ` [RFC PATCH 13/24] KVM: nSVM: Parameterize svm_flush_tlb_asid() by is_guest_mode Yosry Ahmed
2025-04-03 20:10   ` Maxim Levitsky
2025-04-22 10:04     ` Yosry Ahmed
2025-03-26 19:36 ` [RFC PATCH 14/24] KVM: nSVM: Split nested_svm_transition_tlb_flush() into entry/exit fns Yosry Ahmed
2025-03-26 19:36 ` [RFC PATCH 15/24] KVM: x86/mmu: rename __kvm_mmu_invalidate_addr() Yosry Ahmed
2025-04-03 20:10   ` Maxim Levitsky
2025-03-26 19:36 ` [RFC PATCH 16/24] KVM: x86/mmu: Allow skipping the gva flush in kvm_mmu_invalidate_addr() Yosry Ahmed
2025-04-03 20:10   ` Maxim Levitsky
2025-03-26 19:36 ` [RFC PATCH 17/24] KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH Yosry Ahmed
2025-04-03 20:10   ` Maxim Levitsky
2025-03-26 19:41 ` [RFC PATCH 18/24] KVM: nSVM: Handle nested TLB flush requests through TLB_CONTROL Yosry Ahmed
2025-03-26 19:43 ` [RFC PATCH 19/24] KVM: nSVM: Flush the TLB if L1 changes L2's ASID Yosry Ahmed
2025-03-26 19:44 ` [RFC PATCH 20/24] KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry Yosry Ahmed
2025-03-26 19:44   ` [RFC PATCH 21/24] KVM: nSVM: Service local TLB flushes before nested transitions Yosry Ahmed
2025-03-26 19:44   ` [RFC PATCH 22/24] KVM: nSVM: Handle INVLPGA interception correctly Yosry Ahmed
2025-04-03 20:10     ` Maxim Levitsky
2025-06-24  1:08     ` Sean Christopherson
2025-03-26 19:44   ` [RFC PATCH 23/24] KVM: nSVM: Allocate a new ASID for nested guests Yosry Ahmed
2025-04-03 20:11     ` Maxim Levitsky
2025-04-22 10:01       ` Yosry Ahmed
2025-03-26 19:44   ` [RFC PATCH 24/24] KVM: nSVM: Stop bombing the TLB on nested transitions Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aFmEgxB7EWvEOixP@google.com \
    --to=seanjc@google.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=riel@surriel.com \
    --cc=thomas.lendacky@amd.com \
    --cc=vkuznets@redhat.com \
    --cc=x86@kernel.org \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).