[PATCH v7 0/3] KVM: SVM: Flush cache only on CPUs running SEV guest

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v7 0/3] KVM: SVM: Flush cache only on CPUs running SEV guest
@ 2025-01-28  1:53 Zheyun Shen
  2025-01-28  1:53 ` [PATCH v7 1/3] KVM: x86: Add a wbinvd helper Zheyun Shen
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Zheyun Shen @ 2025-01-28  1:53 UTC (permalink / raw)
  To: thomas.lendacky, seanjc, pbonzini, tglx, kevinloughlin, mingo, bp
  Cc: kvm, linux-kernel, Zheyun Shen

Previous versions pointed out the problem of wbinvd_on_all_cpus() in SEV
and tried to maintain a cpumask to solve it. This version includes
further code cleanup.

Although dirty_mask is not maintained perfectly and may lead to wbinvd on 
physical CPUs that are not running a SEV guest, it's still better than 
wbinvd_on_all_cpus(). And vcpu migration is designed to be solved in 
future work.

---
v6 -> v7:
- Fixed the writing oversight in sev_vcpu_load().

v5 -> v6:
- Replaced sev_get_wbinvd_dirty_mask() with the helper function 
to_kvm_sev_info().

v4 -> v5:
- rebase to tip @ 15e2f65f2ecf .
- Added a commit to remove unnecessary calls to wbinvd().
- Changed some comments.

v3 -> v4:
- Added a wbinvd helper and export it to SEV.
- Changed the struct cpumask in kvm_sev_info into cpumask*, which should
be dynamically allocated and freed.
- Changed the time of recording the CPUs from pre_sev_run() to vcpu_load().
- Removed code of clearing the mask.

v2 -> v3:
- Replaced get_cpu() with parameter cpu in pre_sev_run().

v1 -> v2:
- Added sev_do_wbinvd() to wrap two operations.
- Used cpumask_test_and_clear_cpu() to avoid concurrent problems.
---

Zheyun Shen (3):
  KVM: x86: Add a wbinvd helper
  KVM: SVM: Remove wbinvd in sev_vm_destroy()
  KVM: SVM: Flush cache only on CPUs running SEV guest

 arch/x86/kvm/svm/sev.c | 36 +++++++++++++++++++++++++++---------
 arch/x86/kvm/svm/svm.c |  2 ++
 arch/x86/kvm/svm/svm.h |  5 ++++-
 arch/x86/kvm/x86.c     |  9 +++++++--
 arch/x86/kvm/x86.h     |  1 +
 5 files changed, 41 insertions(+), 12 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v7 1/3] KVM: x86: Add a wbinvd helper
  2025-01-28  1:53 [PATCH v7 0/3] KVM: SVM: Flush cache only on CPUs running SEV guest Zheyun Shen
@ 2025-01-28  1:53 ` Zheyun Shen
  2025-02-06 22:03   ` Tom Lendacky
  2025-01-28  1:53 ` [PATCH v7 2/3] KVM: SVM: Remove wbinvd in sev_vm_destroy() Zheyun Shen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Zheyun Shen @ 2025-01-28  1:53 UTC (permalink / raw)
  To: thomas.lendacky, seanjc, pbonzini, tglx, kevinloughlin, mingo, bp
  Cc: kvm, linux-kernel, Zheyun Shen

At the moment open-coded calls to on_each_cpu_mask() are used when
emulating wbinvd. A subsequent patch needs the same behavior and the
helper prevents callers from preparing some idential parameters.

Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
---
 arch/x86/kvm/x86.c | 9 +++++++--
 arch/x86/kvm/x86.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2e7134809..b635e0e5c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8231,8 +8231,7 @@ static int kvm_emulate_wbinvd_noskip(struct kvm_vcpu *vcpu)
 		int cpu = get_cpu();
 
 		cpumask_set_cpu(cpu, vcpu->arch.wbinvd_dirty_mask);
-		on_each_cpu_mask(vcpu->arch.wbinvd_dirty_mask,
-				wbinvd_ipi, NULL, 1);
+		wbinvd_on_many_cpus(vcpu->arch.wbinvd_dirty_mask);
 		put_cpu();
 		cpumask_clear(vcpu->arch.wbinvd_dirty_mask);
 	} else
@@ -13964,6 +13963,12 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
 }
 EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
 
+void wbinvd_on_many_cpus(struct cpumask *mask)
+{
+	on_each_cpu_mask(mask, wbinvd_ipi, NULL, 1);
+}
+EXPORT_SYMBOL_GPL(wbinvd_on_many_cpus);
+
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index ec623d23d..8f715e14b 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -611,5 +611,6 @@ int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t src, unsigned int bytes,
 int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
 			 unsigned int port, void *data,  unsigned int count,
 			 int in);
+void wbinvd_on_many_cpus(struct cpumask *mask);
 
 #endif
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 2/3] KVM: SVM: Remove wbinvd in sev_vm_destroy()
  2025-01-28  1:53 [PATCH v7 0/3] KVM: SVM: Flush cache only on CPUs running SEV guest Zheyun Shen
  2025-01-28  1:53 ` [PATCH v7 1/3] KVM: x86: Add a wbinvd helper Zheyun Shen
@ 2025-01-28  1:53 ` Zheyun Shen
  2025-02-06 22:04   ` Tom Lendacky
  2025-01-28  1:53 ` [PATCH v7 3/3] KVM: SVM: Flush cache only on CPUs running SEV guest Zheyun Shen
  2025-02-26  1:37 ` [PATCH v7 0/3] " Sean Christopherson
  3 siblings, 1 reply; 12+ messages in thread
From: Zheyun Shen @ 2025-01-28  1:53 UTC (permalink / raw)
  To: thomas.lendacky, seanjc, pbonzini, tglx, kevinloughlin, mingo, bp
  Cc: kvm, linux-kernel, Zheyun Shen

Before sev_vm_destroy() is called, kvm_arch_guest_memory_reclaimed()
has been called for SEV and SEV-ES and kvm_arch_gmem_invalidate()
has been called for SEV-SNP. These functions have already handled
flushing the memory. Therefore, this wbinvd_on_all_cpus() can
simply be dropped.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
---
 arch/x86/kvm/svm/sev.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 943bd074a..1ce67de9d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2899,12 +2899,6 @@ void sev_vm_destroy(struct kvm *kvm)
 		return;
 	}
 
-	/*
-	 * Ensure that all guest tagged cache entries are flushed before
-	 * releasing the pages back to the system for use. CLFLUSH will
-	 * not do this, so issue a WBINVD.
-	 */
-	wbinvd_on_all_cpus();
 
 	/*
 	 * if userspace was terminated before unregistering the memory regions
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 3/3] KVM: SVM: Flush cache only on CPUs running SEV guest
  2025-01-28  1:53 [PATCH v7 0/3] KVM: SVM: Flush cache only on CPUs running SEV guest Zheyun Shen
  2025-01-28  1:53 ` [PATCH v7 1/3] KVM: x86: Add a wbinvd helper Zheyun Shen
  2025-01-28  1:53 ` [PATCH v7 2/3] KVM: SVM: Remove wbinvd in sev_vm_destroy() Zheyun Shen
@ 2025-01-28  1:53 ` Zheyun Shen
  2025-02-06 22:05   ` Tom Lendacky
  2025-02-26  1:20   ` Sean Christopherson
  2025-02-26  1:37 ` [PATCH v7 0/3] " Sean Christopherson
  3 siblings, 2 replies; 12+ messages in thread
From: Zheyun Shen @ 2025-01-28  1:53 UTC (permalink / raw)
  To: thomas.lendacky, seanjc, pbonzini, tglx, kevinloughlin, mingo, bp
  Cc: kvm, linux-kernel, Zheyun Shen

On AMD CPUs without ensuring cache consistency, each memory page
reclamation in an SEV guest triggers a call to wbinvd_on_all_cpus(),
thereby affecting the performance of other programs on the host.

Typically, an AMD server may have 128 cores or more, while the SEV guest
might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
to bind these 8 vCPUs to specific physical CPUs.

Therefore, keeping a record of the physical core numbers each time a vCPU
runs can help avoid flushing the cache for all CPUs every time.

Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
---
 arch/x86/kvm/svm/sev.c | 30 +++++++++++++++++++++++++++---
 arch/x86/kvm/svm/svm.c |  2 ++
 arch/x86/kvm/svm/svm.h |  5 ++++-
 3 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1ce67de9d..4b80ecbe7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -252,6 +252,27 @@ static void sev_asid_free(struct kvm_sev_info *sev)
 	sev->misc_cg = NULL;
 }
 
+void sev_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+	/*
+	 * To optimize cache flushes when memory is reclaimed from an SEV VM,
+	 * track physical CPUs that enter the guest for SEV VMs and thus can
+	 * have encrypted, dirty data in the cache, and flush caches only for
+	 * CPUs that have entered the guest.
+	 */
+	cpumask_set_cpu(cpu, to_kvm_sev_info(vcpu->kvm)->wbinvd_dirty_mask);
+}
+
+static void sev_do_wbinvd(struct kvm *kvm)
+{
+	/*
+	 * TODO: Clear CPUs from the bitmap prior to flushing.  Doing so
+	 * requires serializing multiple calls and having CPUs mark themselves
+	 * "dirty" if they are currently running a vCPU for the VM.
+	 */
+	wbinvd_on_many_cpus(to_kvm_sev_info(kvm)->wbinvd_dirty_mask);
+}
+
 static void sev_decommission(unsigned int handle)
 {
 	struct sev_data_decommission decommission;
@@ -448,6 +469,8 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
 	ret = sev_platform_init(&init_args);
 	if (ret)
 		goto e_free;
+	if (!zalloc_cpumask_var(&sev->wbinvd_dirty_mask, GFP_KERNEL_ACCOUNT))
+		goto e_free;
 
 	/* This needs to happen after SEV/SNP firmware initialization. */
 	if (vm_type == KVM_X86_SNP_VM) {
@@ -2778,7 +2801,7 @@ int sev_mem_enc_unregister_region(struct kvm *kvm,
 	 * releasing the pages back to the system for use. CLFLUSH will
 	 * not do this, so issue a WBINVD.
 	 */
-	wbinvd_on_all_cpus();
+	sev_do_wbinvd(kvm);
 
 	__unregister_enc_region_locked(kvm, region);
 
@@ -2926,6 +2949,7 @@ void sev_vm_destroy(struct kvm *kvm)
 	}
 
 	sev_asid_free(sev);
+	free_cpumask_var(sev->wbinvd_dirty_mask);
 }
 
 void __init sev_set_cpu_caps(void)
@@ -3129,7 +3153,7 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
 	return;
 
 do_wbinvd:
-	wbinvd_on_all_cpus();
+	sev_do_wbinvd(vcpu->kvm);
 }
 
 void sev_guest_memory_reclaimed(struct kvm *kvm)
@@ -3143,7 +3167,7 @@ void sev_guest_memory_reclaimed(struct kvm *kvm)
 	if (!sev_guest(kvm) || sev_snp_guest(kvm))
 		return;
 
-	wbinvd_on_all_cpus();
+	sev_do_wbinvd(kvm);
 }
 
 void sev_free_vcpu(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dd15cc635..f3b03b0d8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1565,6 +1565,8 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	}
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_vcpu_load(vcpu, cpu);
+	if (sev_guest(vcpu->kvm))
+		sev_vcpu_load(vcpu, cpu);
 }
 
 static void svm_vcpu_put(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 43fa6a16e..82ec80cf4 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -112,6 +112,8 @@ struct kvm_sev_info {
 	void *guest_req_buf;    /* Bounce buffer for SNP Guest Request input */
 	void *guest_resp_buf;   /* Bounce buffer for SNP Guest Request output */
 	struct mutex guest_req_mutex; /* Must acquire before using bounce buffers */
+	/* CPUs invoked VMRUN call wbinvd after guest memory is reclaimed */
+	struct cpumask *wbinvd_dirty_mask;
 };
 
 struct kvm_svm {
@@ -763,6 +765,7 @@ void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
 int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
 void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
 int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn);
+void sev_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
 #else
 static inline struct page *snp_safe_alloc_page_node(int node, gfp_t gfp)
 {
@@ -793,7 +796,7 @@ static inline int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
 {
 	return 0;
 }
-
+static inline void sev_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
 #endif
 
 /* vmenter.S */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 1/3] KVM: x86: Add a wbinvd helper
  2025-01-28  1:53 ` [PATCH v7 1/3] KVM: x86: Add a wbinvd helper Zheyun Shen
@ 2025-02-06 22:03   ` Tom Lendacky
  2025-02-26  0:59     ` Sean Christopherson
  0 siblings, 1 reply; 12+ messages in thread
From: Tom Lendacky @ 2025-02-06 22:03 UTC (permalink / raw)
  To: Zheyun Shen, seanjc, pbonzini, tglx, kevinloughlin, mingo, bp
  Cc: kvm, linux-kernel

On 1/27/25 19:53, Zheyun Shen wrote:
> At the moment open-coded calls to on_each_cpu_mask() are used when
> emulating wbinvd. A subsequent patch needs the same behavior and the
> helper prevents callers from preparing some idential parameters.
> 
> Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>

Not sure if this wouldn't be better living in the same files that
wbinvd_on_all_cpus() lives, so I'll leave it up to the maintainers.

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>

> ---
>  arch/x86/kvm/x86.c | 9 +++++++--
>  arch/x86/kvm/x86.h | 1 +
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2e7134809..b635e0e5c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8231,8 +8231,7 @@ static int kvm_emulate_wbinvd_noskip(struct kvm_vcpu *vcpu)
>  		int cpu = get_cpu();
>  
>  		cpumask_set_cpu(cpu, vcpu->arch.wbinvd_dirty_mask);
> -		on_each_cpu_mask(vcpu->arch.wbinvd_dirty_mask,
> -				wbinvd_ipi, NULL, 1);
> +		wbinvd_on_many_cpus(vcpu->arch.wbinvd_dirty_mask);
>  		put_cpu();
>  		cpumask_clear(vcpu->arch.wbinvd_dirty_mask);
>  	} else
> @@ -13964,6 +13963,12 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
>  }
>  EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
>  
> +void wbinvd_on_many_cpus(struct cpumask *mask)
> +{
> +	on_each_cpu_mask(mask, wbinvd_ipi, NULL, 1);
> +}
> +EXPORT_SYMBOL_GPL(wbinvd_on_many_cpus);
> +
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index ec623d23d..8f715e14b 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -611,5 +611,6 @@ int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t src, unsigned int bytes,
>  int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
>  			 unsigned int port, void *data,  unsigned int count,
>  			 int in);
> +void wbinvd_on_many_cpus(struct cpumask *mask);
>  
>  #endif

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 2/3] KVM: SVM: Remove wbinvd in sev_vm_destroy()
  2025-01-28  1:53 ` [PATCH v7 2/3] KVM: SVM: Remove wbinvd in sev_vm_destroy() Zheyun Shen
@ 2025-02-06 22:04   ` Tom Lendacky
  0 siblings, 0 replies; 12+ messages in thread
From: Tom Lendacky @ 2025-02-06 22:04 UTC (permalink / raw)
  To: Zheyun Shen, seanjc, pbonzini, tglx, kevinloughlin, mingo, bp
  Cc: kvm, linux-kernel

On 1/27/25 19:53, Zheyun Shen wrote:
> Before sev_vm_destroy() is called, kvm_arch_guest_memory_reclaimed()
> has been called for SEV and SEV-ES and kvm_arch_gmem_invalidate()
> has been called for SEV-SNP. These functions have already handled
> flushing the memory. Therefore, this wbinvd_on_all_cpus() can
> simply be dropped.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>

> ---
>  arch/x86/kvm/svm/sev.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 943bd074a..1ce67de9d 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2899,12 +2899,6 @@ void sev_vm_destroy(struct kvm *kvm)
>  		return;
>  	}
>  
> -	/*
> -	 * Ensure that all guest tagged cache entries are flushed before
> -	 * releasing the pages back to the system for use. CLFLUSH will
> -	 * not do this, so issue a WBINVD.
> -	 */
> -	wbinvd_on_all_cpus();
>  
>  	/*
>  	 * if userspace was terminated before unregistering the memory regions

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 3/3] KVM: SVM: Flush cache only on CPUs running SEV guest
  2025-01-28  1:53 ` [PATCH v7 3/3] KVM: SVM: Flush cache only on CPUs running SEV guest Zheyun Shen
@ 2025-02-06 22:05   ` Tom Lendacky
  2025-02-26  1:20   ` Sean Christopherson
  1 sibling, 0 replies; 12+ messages in thread
From: Tom Lendacky @ 2025-02-06 22:05 UTC (permalink / raw)
  To: Zheyun Shen, seanjc, pbonzini, tglx, kevinloughlin, mingo, bp
  Cc: kvm, linux-kernel

On 1/27/25 19:53, Zheyun Shen wrote:
> On AMD CPUs without ensuring cache consistency, each memory page
> reclamation in an SEV guest triggers a call to wbinvd_on_all_cpus(),
> thereby affecting the performance of other programs on the host.
> 
> Typically, an AMD server may have 128 cores or more, while the SEV guest
> might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
> to bind these 8 vCPUs to specific physical CPUs.
> 
> Therefore, keeping a record of the physical core numbers each time a vCPU
> runs can help avoid flushing the cache for all CPUs every time.
> 
> Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>

Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>

> ---
>  arch/x86/kvm/svm/sev.c | 30 +++++++++++++++++++++++++++---
>  arch/x86/kvm/svm/svm.c |  2 ++
>  arch/x86/kvm/svm/svm.h |  5 ++++-
>  3 files changed, 33 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 1ce67de9d..4b80ecbe7 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -252,6 +252,27 @@ static void sev_asid_free(struct kvm_sev_info *sev)
>  	sev->misc_cg = NULL;
>  }
>  
> +void sev_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> +{
> +	/*
> +	 * To optimize cache flushes when memory is reclaimed from an SEV VM,
> +	 * track physical CPUs that enter the guest for SEV VMs and thus can
> +	 * have encrypted, dirty data in the cache, and flush caches only for
> +	 * CPUs that have entered the guest.
> +	 */
> +	cpumask_set_cpu(cpu, to_kvm_sev_info(vcpu->kvm)->wbinvd_dirty_mask);
> +}
> +
> +static void sev_do_wbinvd(struct kvm *kvm)
> +{
> +	/*
> +	 * TODO: Clear CPUs from the bitmap prior to flushing.  Doing so
> +	 * requires serializing multiple calls and having CPUs mark themselves
> +	 * "dirty" if they are currently running a vCPU for the VM.
> +	 */
> +	wbinvd_on_many_cpus(to_kvm_sev_info(kvm)->wbinvd_dirty_mask);
> +}
> +
>  static void sev_decommission(unsigned int handle)
>  {
>  	struct sev_data_decommission decommission;
> @@ -448,6 +469,8 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
>  	ret = sev_platform_init(&init_args);
>  	if (ret)
>  		goto e_free;
> +	if (!zalloc_cpumask_var(&sev->wbinvd_dirty_mask, GFP_KERNEL_ACCOUNT))
> +		goto e_free;
>  
>  	/* This needs to happen after SEV/SNP firmware initialization. */
>  	if (vm_type == KVM_X86_SNP_VM) {
> @@ -2778,7 +2801,7 @@ int sev_mem_enc_unregister_region(struct kvm *kvm,
>  	 * releasing the pages back to the system for use. CLFLUSH will
>  	 * not do this, so issue a WBINVD.
>  	 */
> -	wbinvd_on_all_cpus();
> +	sev_do_wbinvd(kvm);
>  
>  	__unregister_enc_region_locked(kvm, region);
>  
> @@ -2926,6 +2949,7 @@ void sev_vm_destroy(struct kvm *kvm)
>  	}
>  
>  	sev_asid_free(sev);
> +	free_cpumask_var(sev->wbinvd_dirty_mask);
>  }
>  
>  void __init sev_set_cpu_caps(void)
> @@ -3129,7 +3153,7 @@ static void sev_flush_encrypted_page(struct kvm_vcpu *vcpu, void *va)
>  	return;
>  
>  do_wbinvd:
> -	wbinvd_on_all_cpus();
> +	sev_do_wbinvd(vcpu->kvm);
>  }
>  
>  void sev_guest_memory_reclaimed(struct kvm *kvm)
> @@ -3143,7 +3167,7 @@ void sev_guest_memory_reclaimed(struct kvm *kvm)
>  	if (!sev_guest(kvm) || sev_snp_guest(kvm))
>  		return;
>  
> -	wbinvd_on_all_cpus();
> +	sev_do_wbinvd(kvm);
>  }
>  
>  void sev_free_vcpu(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index dd15cc635..f3b03b0d8 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1565,6 +1565,8 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	}
>  	if (kvm_vcpu_apicv_active(vcpu))
>  		avic_vcpu_load(vcpu, cpu);
> +	if (sev_guest(vcpu->kvm))
> +		sev_vcpu_load(vcpu, cpu);
>  }
>  
>  static void svm_vcpu_put(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 43fa6a16e..82ec80cf4 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -112,6 +112,8 @@ struct kvm_sev_info {
>  	void *guest_req_buf;    /* Bounce buffer for SNP Guest Request input */
>  	void *guest_resp_buf;   /* Bounce buffer for SNP Guest Request output */
>  	struct mutex guest_req_mutex; /* Must acquire before using bounce buffers */
> +	/* CPUs invoked VMRUN call wbinvd after guest memory is reclaimed */
> +	struct cpumask *wbinvd_dirty_mask;
>  };
>  
>  struct kvm_svm {
> @@ -763,6 +765,7 @@ void sev_snp_init_protected_guest_state(struct kvm_vcpu *vcpu);
>  int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order);
>  void sev_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
>  int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn);
> +void sev_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
>  #else
>  static inline struct page *snp_safe_alloc_page_node(int node, gfp_t gfp)
>  {
> @@ -793,7 +796,7 @@ static inline int sev_private_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn)
>  {
>  	return 0;
>  }
> -
> +static inline void sev_vcpu_load(struct kvm_vcpu *vcpu, int cpu) {}
>  #endif
>  
>  /* vmenter.S */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 1/3] KVM: x86: Add a wbinvd helper
  2025-02-06 22:03   ` Tom Lendacky
@ 2025-02-26  0:59     ` Sean Christopherson
  0 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2025-02-26  0:59 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Zheyun Shen, pbonzini, tglx, kevinloughlin, mingo, bp, kvm,
	linux-kernel

On Thu, Feb 06, 2025, Tom Lendacky wrote:
> On 1/27/25 19:53, Zheyun Shen wrote:
> > At the moment open-coded calls to on_each_cpu_mask() are used when
> > emulating wbinvd. A subsequent patch needs the same behavior and the
> > helper prevents callers from preparing some idential parameters.
> > 
> > Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
> 
> Not sure if this wouldn't be better living in the same files that
> wbinvd_on_all_cpus() lives, so I'll leave it up to the maintainers.

It definitely belongs in arch/x86/lib/cache-smp.c.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 3/3] KVM: SVM: Flush cache only on CPUs running SEV guest
  2025-01-28  1:53 ` [PATCH v7 3/3] KVM: SVM: Flush cache only on CPUs running SEV guest Zheyun Shen
  2025-02-06 22:05   ` Tom Lendacky
@ 2025-02-26  1:20   ` Sean Christopherson
  2025-02-26  3:26     ` Zheyun Shen
  1 sibling, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2025-02-26  1:20 UTC (permalink / raw)
  To: Zheyun Shen
  Cc: thomas.lendacky, pbonzini, tglx, kevinloughlin, mingo, bp, kvm,
	linux-kernel

On Tue, Jan 28, 2025, Zheyun Shen wrote:
> On AMD CPUs without ensuring cache consistency, each memory page
> reclamation in an SEV guest triggers a call to wbinvd_on_all_cpus(),
> thereby affecting the performance of other programs on the host.
> 
> Typically, an AMD server may have 128 cores or more, while the SEV guest
> might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
> to bind these 8 vCPUs to specific physical CPUs.
> 
> Therefore, keeping a record of the physical core numbers each time a vCPU
> runs can help avoid flushing the cache for all CPUs every time.
> 
> Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
> ---
>  arch/x86/kvm/svm/sev.c | 30 +++++++++++++++++++++++++++---
>  arch/x86/kvm/svm/svm.c |  2 ++
>  arch/x86/kvm/svm/svm.h |  5 ++++-
>  3 files changed, 33 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 1ce67de9d..4b80ecbe7 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -252,6 +252,27 @@ static void sev_asid_free(struct kvm_sev_info *sev)
>  	sev->misc_cg = NULL;
>  }
>  
> +void sev_vcpu_load(struct kvm_vcpu *vcpu, int cpu)

And now I'm very confused.

v1 and v2 marked the CPU dirty in pre_sev_run(), which AFAICT is exactly when a
CPU should be recorded as having dirtied memory.  v3 fixed a bug with using
get_cpu(), but otherwise was unchanged.  Tom even gave a Tested-by for v3.

Then v4 comes along, and without explanation, moved the code to vcpu_load().

  Changed the time of recording the CPUs from pre_sev_run() to vcpu_load().

Why?  If there's a good reason, then that absolutely, positively belongs in the
changelog and in the code as a comment.  If there's no good reason, then...

Unless I hear otherwise, my plan is to move this back to pre_sev_run().

> +{
> +	/*
> +	 * To optimize cache flushes when memory is reclaimed from an SEV VM,
> +	 * track physical CPUs that enter the guest for SEV VMs and thus can
> +	 * have encrypted, dirty data in the cache, and flush caches only for
> +	 * CPUs that have entered the guest.
> +	 */
> +	cpumask_set_cpu(cpu, to_kvm_sev_info(vcpu->kvm)->wbinvd_dirty_mask);
> +}
> +
> +static void sev_do_wbinvd(struct kvm *kvm)
> +{
> +	/*
> +	 * TODO: Clear CPUs from the bitmap prior to flushing.  Doing so
> +	 * requires serializing multiple calls and having CPUs mark themselves
> +	 * "dirty" if they are currently running a vCPU for the VM.
> +	 */

A comment is definitely warranted, but I don't think we should mark it TODO.  I'm
not convinced the benefits justify the complexity, and I don't want someone trying
to "fix" the code because it has a TODO.

> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 43fa6a16e..82ec80cf4 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -112,6 +112,8 @@ struct kvm_sev_info {
>  	void *guest_req_buf;    /* Bounce buffer for SNP Guest Request input */
>  	void *guest_resp_buf;   /* Bounce buffer for SNP Guest Request output */
>  	struct mutex guest_req_mutex; /* Must acquire before using bounce buffers */
> +	/* CPUs invoked VMRUN call wbinvd after guest memory is reclaimed */
> +	struct cpumask *wbinvd_dirty_mask;

This needs to be cpumask_var_t, as the cpumask APIs expect the mask to be
statically allocated when CONFIG_CPUMASK_OFFSTACK=n.  E.g. this will hit a NULL
pointer deref.

  static __always_inline bool zalloc_cpumask_var(cpumask_var_t *mask, gfp_t flags)
  {
	cpumask_clear(*mask);
	return true;
  }

The wbinvd_dirty_mask name also turns out to be less than good.  In part because
of the looming WBNOINVD change, but also because it kinda sorta collides with
wbinvd_dirty_mask in kvm_vcpu_arch, which gets really confusing when trying to
read the code.

I don't have any great ideas, the best I came up with was have_run_cpus.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 0/3] KVM: SVM: Flush cache only on CPUs running SEV guest
  2025-01-28  1:53 [PATCH v7 0/3] KVM: SVM: Flush cache only on CPUs running SEV guest Zheyun Shen
                   ` (2 preceding siblings ...)
  2025-01-28  1:53 ` [PATCH v7 3/3] KVM: SVM: Flush cache only on CPUs running SEV guest Zheyun Shen
@ 2025-02-26  1:37 ` Sean Christopherson
  3 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2025-02-26  1:37 UTC (permalink / raw)
  To: Zheyun Shen
  Cc: thomas.lendacky, pbonzini, tglx, kevinloughlin, mingo, bp, kvm,
	linux-kernel

On Tue, Jan 28, 2025, Zheyun Shen wrote:
> Previous versions pointed out the problem of wbinvd_on_all_cpus() in SEV
> and tried to maintain a cpumask to solve it. This version includes
> further code cleanup.
> 
> Although dirty_mask is not maintained perfectly and may lead to wbinvd on 
> physical CPUs that are not running a SEV guest, it's still better than 
> wbinvd_on_all_cpus(). And vcpu migration is designed to be solved in 
> future work.

I have a variety of comments, but no need to send a new version.  I'm going to
post a combined version with the WBNOINVD series, hopefully tomorrow.

The only thing that needs your attention is the pre_sev_run() => sev_vcpu_load()
change between v3 and v4.

> ---
> v6 -> v7:
> - Fixed the writing oversight in sev_vcpu_load().
> 
> v5 -> v6:
> - Replaced sev_get_wbinvd_dirty_mask() with the helper function 
> to_kvm_sev_info().
> 
> v4 -> v5:
> - rebase to tip @ 15e2f65f2ecf .
> - Added a commit to remove unnecessary calls to wbinvd().
> - Changed some comments.
> 
> v3 -> v4:
> - Added a wbinvd helper and export it to SEV.
> - Changed the struct cpumask in kvm_sev_info into cpumask*, which should
> be dynamically allocated and freed.
> - Changed the time of recording the CPUs from pre_sev_run() to vcpu_load().
> - Removed code of clearing the mask.
> 
> v2 -> v3:
> - Replaced get_cpu() with parameter cpu in pre_sev_run().
> 
> v1 -> v2:
> - Added sev_do_wbinvd() to wrap two operations.
> - Used cpumask_test_and_clear_cpu() to avoid concurrent problems.
> ---
> 
> Zheyun Shen (3):
>   KVM: x86: Add a wbinvd helper
>   KVM: SVM: Remove wbinvd in sev_vm_destroy()
>   KVM: SVM: Flush cache only on CPUs running SEV guest
> 
>  arch/x86/kvm/svm/sev.c | 36 +++++++++++++++++++++++++++---------
>  arch/x86/kvm/svm/svm.c |  2 ++
>  arch/x86/kvm/svm/svm.h |  5 ++++-
>  arch/x86/kvm/x86.c     |  9 +++++++--
>  arch/x86/kvm/x86.h     |  1 +
>  5 files changed, 41 insertions(+), 12 deletions(-)
> 
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 3/3] KVM: SVM: Flush cache only on CPUs running SEV guest
  2025-02-26  1:20   ` Sean Christopherson
@ 2025-02-26  3:26     ` Zheyun Shen
  2025-02-26 23:58       ` Sean Christopherson
  0 siblings, 1 reply; 12+ messages in thread
From: Zheyun Shen @ 2025-02-26  3:26 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Tom Lendacky, pbonzini, tglx, Kevin Loughlin, mingo, bp, kvm,
	linux-kernel

I'm very sorry that the formatting of my previous email was messed up due to an issue with the email client. I am sending a new email with the same content.

> Sean Christopherson <seanjc@google.com> wrote：
> 
> On Tue, Jan 28, 2025, Zheyun Shen wrote:
>> On AMD CPUs without ensuring cache consistency, each memory page
>> reclamation in an SEV guest triggers a call to wbinvd_on_all_cpus(),
>> thereby affecting the performance of other programs on the host.
>> 
>> Typically, an AMD server may have 128 cores or more, while the SEV guest
>> might only utilize 8 of these cores. Meanwhile, host can use qemu-affinity
>> to bind these 8 vCPUs to specific physical CPUs.
>> 
>> Therefore, keeping a record of the physical core numbers each time a vCPU
>> runs can help avoid flushing the cache for all CPUs every time.
>> 
>> Signed-off-by: Zheyun Shen <szy0127@sjtu.edu.cn>
>> ---
>> arch/x86/kvm/svm/sev.c | 30 +++++++++++++++++++++++++++---
>> arch/x86/kvm/svm/svm.c |  2 ++
>> arch/x86/kvm/svm/svm.h |  5 ++++-
>> 3 files changed, 33 insertions(+), 4 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 1ce67de9d..4b80ecbe7 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -252,6 +252,27 @@ static void sev_asid_free(struct kvm_sev_info *sev)
>> sev->misc_cg = NULL;
>> }
>> 
>> +void sev_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> 
> And now I'm very confused.
> 
> v1 and v2 marked the CPU dirty in pre_sev_run(), which AFAICT is exactly when a
> CPU should be recorded as having dirtied memory.  v3 fixed a bug with using
> get_cpu(), but otherwise was unchanged.  Tom even gave a Tested-by for v3.
> 
> Then v4 comes along, and without explanation, moved the code to vcpu_load().
> 
I apologize for not including sufficient information in the changelog, which led to your confusion.

> Changed the time of recording the CPUs from pre_sev_run() to vcpu_load().
> 
> Why?  If there's a good reason, then that absolutely, positively belongs in the
> changelog and in the code as a comment.  If there's no good reason, then...
> 
The reason I moved the timing of CPU recording from pre_sev_run(） to vcpu_load() is that I found vcpu_load() is always present in the call path of kvm_arch_vcpu_ioctl_run(). Moreover, whenever a vCPU migration occurs, the control flow will reach vcpu_load() again to ensure the correctness of CPU recording. On the other hand, recording information in pre_sev_run() would result in recording the CPU number every time before entering the guest. Without vCPU migration, only the first time to record is effective and the subsequent records are redundant and thus waste time. This would result in each VM exit taking longer (although the additional time may be very short).

> Unless I hear otherwise, my plan is to move this back to pre_sev_run().
> 

Another issue in the v3 version is that I incorrectly cleared the recorded mask after each cache flushing. The mask bits should be cleared and changed at the time of vCPU migration rather than after a cache flushing.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 3/3] KVM: SVM: Flush cache only on CPUs running SEV guest
  2025-02-26  3:26     ` Zheyun Shen
@ 2025-02-26 23:58       ` Sean Christopherson
  0 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2025-02-26 23:58 UTC (permalink / raw)
  To: Zheyun Shen
  Cc: Tom Lendacky, pbonzini, tglx, Kevin Loughlin, mingo, bp, kvm,
	linux-kernel

On Wed, Feb 26, 2025, Zheyun Shen wrote:
> I'm very sorry that the formatting of my previous email was messed up due to
> an issue with the email client. I am sending a new email with the same
> content.

Something is still not quite right, as your mails aren't hitting lore.kernel.org,
i.e. are getting dropped by the lists.

> > Sean Christopherson <seanjc@google.com> wrote：
> > 
> > On Tue, Jan 28, 2025, Zheyun Shen wrote:
> > Changed the time of recording the CPUs from pre_sev_run() to vcpu_load().
> > 
> > Why?  If there's a good reason, then that absolutely, positively belongs in the
> > changelog and in the code as a comment.  If there's no good reason, then...
> > 
> The reason I moved the timing of CPU recording from pre_sev_run(） to
> vcpu_load() is that I found vcpu_load() is always present in the call path of
> kvm_arch_vcpu_ioctl_run(). Moreover, whenever a vCPU migration occurs, the
> control flow will reach vcpu_load() again to ensure the correctness of CPU
> recording. On the other hand, recording information in pre_sev_run() would
> result in recording the CPU number every time before entering the guest.
> Without vCPU migration, only the first time to record is effective and the
> subsequent records are redundant and thus waste time. This would result in
> each VM exit taking longer (although the additional time may be very short).

So long as KVM performs a lockless test to see if the CPU, the cost should be
minimal.  This:

	if (!cpumask_test_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus))
		cpumask_set_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus);

Generates a bt+jb to guard the locked bts, i.e. *should* only add 1-2 uops to the
entry flow, once CPU's predictors warmed up enough to not prefetch the RMW and
bound the cache line.

   0x0000000000016603 <+67>:	bt     %r8,0xa428(%rdx)
   0x000000000001660b <+75>:	jb     0x16616 <pre_sev_run+86>
   0x000000000001660d <+77>:	lock bts %r8,0xa428(%rdx)

If it turns out that marking a CPU as having run that ASID becomes a hot spot,
then I'm definitely open to moving things around.  But for a first go at this,
and especially without evidence that it's problematic, I want to go with the
approach that's most obviously correct.

> > Unless I hear otherwise, my plan is to move this back to pre_sev_run().
> > 
> 
> Another issue in the v3 version is that I incorrectly cleared the recorded
> mask after each cache flushing. The mask bits should be cleared and changed
> at the time of vCPU migration rather than after a cache flushing.

The bits can't be cleared at vCPU migration, they can only be cleared when a
flush is issued.  A vCPU that did VMRUN with an ASID in the past could still
have dirty data cached for that ASID.

KVM could send an IPI to the previous CPU to do a cache flush on migration,
similar to what KVM already does on VMX to VMCLEAR the VMCS.  But the math and
usage for WB(NO)INVD is wildly different.  For VMCLEAR, the IPI approach is the
lazy approach; KVM *must* VMCLEAR the VMCS if the CPU changes, using an IPI to
defer VMCLEAR allows KVM to skip doing VMCLEAR whenever a vCPU is scheduled out.

For WB(NO)INVD, the IPI approach would be an eager approach.  Deferring WB(NO)INVD
until it's necessary allows KVM to skip the WB(NO)INVD when a vCPU is migrated.

In short, no optimization is obviously better than another at this point.  E.g.
if a VM is not hard pinned 1:1, but vCPUs are affined to certain cores, then
flushing on migration is a terrible idea because the total set of CPUs that need
to flush caches is more or less static.

Anyways, as above, I definitely want to go with the simplest implementation
(make the bitmap "sticky"), and then iterate, optimize, and add complexity if and
only if it's truly justified.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-02-26 23:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-28  1:53 [PATCH v7 0/3] KVM: SVM: Flush cache only on CPUs running SEV guest Zheyun Shen
2025-01-28  1:53 ` [PATCH v7 1/3] KVM: x86: Add a wbinvd helper Zheyun Shen
2025-02-06 22:03   ` Tom Lendacky
2025-02-26  0:59     ` Sean Christopherson
2025-01-28  1:53 ` [PATCH v7 2/3] KVM: SVM: Remove wbinvd in sev_vm_destroy() Zheyun Shen
2025-02-06 22:04   ` Tom Lendacky
2025-01-28  1:53 ` [PATCH v7 3/3] KVM: SVM: Flush cache only on CPUs running SEV guest Zheyun Shen
2025-02-06 22:05   ` Tom Lendacky
2025-02-26  1:20   ` Sean Christopherson
2025-02-26  3:26     ` Zheyun Shen
2025-02-26 23:58       ` Sean Christopherson
2025-02-26  1:37 ` [PATCH v7 0/3] " Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox