* [PATCH 0/2] KVM: SVM: fixes for SEV
@ 2025-08-11 20:30 Yury Norov
2025-08-11 20:30 ` [PATCH 1/2] KVM: SVM: don't check have_run_cpus in sev_writeback_caches() Yury Norov
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Yury Norov @ 2025-08-11 20:30 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, x86, kvm,
linux-kernel
Cc: Yury Norov, Zheyun Shen
From: Yury Norov (NVIDIA) <yury.norov@gmail.com>
A couple fixes for recently merged sev_writeback_caches() and
pre_sev_run() changes.
Yury Norov (NVIDIA) (2):
KVM: SVM: don't check have_run_cpus in sev_writeback_caches()
KVM: SVM: drop useless cpumask_test_cpu() in pre_sev_run()
arch/x86/kvm/svm/sev.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] KVM: SVM: don't check have_run_cpus in sev_writeback_caches()
2025-08-11 20:30 [PATCH 0/2] KVM: SVM: fixes for SEV Yury Norov
@ 2025-08-11 20:30 ` Yury Norov
2025-08-11 20:50 ` Sean Christopherson
2025-08-11 20:30 ` [PATCH 2/2] KVM: SVM: drop useless cpumask_test_cpu() in pre_sev_run() Yury Norov
2025-08-19 23:11 ` [PATCH 0/2] KVM: SVM: fixes for SEV Sean Christopherson
2 siblings, 1 reply; 12+ messages in thread
From: Yury Norov @ 2025-08-11 20:30 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, x86, kvm,
linux-kernel
Cc: Yury Norov, Zheyun Shen
From: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Before calling wbnoinvd_on_cpus_mask(), the function checks the cpumask
for emptiness. It's useless, as the following wbnoinvd_on_cpus_mask()
ends up with smp_call_function_many_cond(), which handles empty cpumask
correctly.
While there, move function-wide comment on top of the function.
Fixes: 6f38f8c57464 ("KVM: SVM: Flush cache only on CPUs running SEV guest")
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
---
arch/x86/kvm/svm/sev.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2fbdebf79fbb..49d7557de8bc 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -716,15 +716,12 @@ static void sev_clflush_pages(struct page *pages[], unsigned long npages)
}
}
+/*
+ * The caller is responsible for ensuring correctness if the mask
+ * can be modified, e.g. if a CPU could be doing VMRUN.
+ */
static void sev_writeback_caches(struct kvm *kvm)
{
- /*
- * Note, the caller is responsible for ensuring correctness if the mask
- * can be modified, e.g. if a CPU could be doing VMRUN.
- */
- if (cpumask_empty(to_kvm_sev_info(kvm)->have_run_cpus))
- return;
-
/*
* Ensure that all dirty guest tagged cache entries are written back
* before releasing the pages back to the system for use. CLFLUSH will
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] KVM: SVM: drop useless cpumask_test_cpu() in pre_sev_run()
2025-08-11 20:30 [PATCH 0/2] KVM: SVM: fixes for SEV Yury Norov
2025-08-11 20:30 ` [PATCH 1/2] KVM: SVM: don't check have_run_cpus in sev_writeback_caches() Yury Norov
@ 2025-08-11 20:30 ` Yury Norov
2025-08-11 20:45 ` Sean Christopherson
2025-08-19 23:11 ` [PATCH 0/2] KVM: SVM: fixes for SEV Sean Christopherson
2 siblings, 1 reply; 12+ messages in thread
From: Yury Norov @ 2025-08-11 20:30 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, x86, kvm,
linux-kernel
Cc: Yury Norov, Zheyun Shen
Testing cpumask for a CPU to be cleared just before setting the exact
same CPU is useless because the end result is always the same: CPU is
set.
While there, switch CPU setter to a non-atomic version. Atomicity is
useless here because sev_writeback_caches() ends up with a plain
for_each_cpu() loop in smp_call_function_many_cond(), which is not
atomic by nature.
Fixes: 6f38f8c57464 ("KVM: SVM: Flush cache only on CPUs running SEV guest")
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
arch/x86/kvm/svm/sev.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 49d7557de8bc..8170674d39c1 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3498,8 +3498,7 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
* have encrypted, dirty data in the cache, and flush caches only for
* CPUs that have entered the guest.
*/
- if (!cpumask_test_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus))
- cpumask_set_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus);
+ __cpumask_set_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus);
/* Assign the asid allocated with this SEV guest */
svm->asid = asid;
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: drop useless cpumask_test_cpu() in pre_sev_run()
2025-08-11 20:30 ` [PATCH 2/2] KVM: SVM: drop useless cpumask_test_cpu() in pre_sev_run() Yury Norov
@ 2025-08-11 20:45 ` Sean Christopherson
2025-08-11 21:28 ` Yury Norov
0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2025-08-11 20:45 UTC (permalink / raw)
To: Yury Norov
Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, x86, kvm, linux-kernel, Zheyun Shen
On Mon, Aug 11, 2025, Yury Norov wrote:
> Testing cpumask for a CPU to be cleared just before setting the exact
> same CPU is useless because the end result is always the same: CPU is
> set.
No, it is not useless. Blindly writing to the variable will unnecessarily bounce
the cacheline, and this is a hot path.
> While there, switch CPU setter to a non-atomic version. Atomicity is
> useless here
No, atomicity isn't useless here either. Dropping atomicity could result in
CPU's bit being lost. I.e. the atomic accesses aren't for the benefit of
smp_call_function_many_cond(), the writes are atomic so that multiple vCPUs can
concurrently update the mask without needing additional protection.
> because sev_writeback_caches() ends up with a plain
> for_each_cpu() loop in smp_call_function_many_cond(), which is not
> atomic by nature.
That's fine. As noted in sev_writeback_caches(), if vCPU could be running, then
the caller is responsible for ensuring that all vCPUs flush caches before the
memory being reclaimed is fully freed. Those guarantees are provided by KVM's
MMU.
sev_writeback_caches() => smp_call_function_many_cond() could hit false positives,
i.e. trigger WBINVD on CPUs that couldn't possibly have accessed the memory being
reclaimed, but such false positives are functionally benign, and are "intended"
in the sense that we chose to prioritize simplicity over precision.
> Fixes: 6f38f8c57464 ("KVM: SVM: Flush cache only on CPUs running SEV guest")
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
> arch/x86/kvm/svm/sev.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 49d7557de8bc..8170674d39c1 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3498,8 +3498,7 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
> * have encrypted, dirty data in the cache, and flush caches only for
> * CPUs that have entered the guest.
> */
> - if (!cpumask_test_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus))
> - cpumask_set_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus);
> + __cpumask_set_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus);
>
> /* Assign the asid allocated with this SEV guest */
> svm->asid = asid;
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] KVM: SVM: don't check have_run_cpus in sev_writeback_caches()
2025-08-11 20:30 ` [PATCH 1/2] KVM: SVM: don't check have_run_cpus in sev_writeback_caches() Yury Norov
@ 2025-08-11 20:50 ` Sean Christopherson
2025-08-11 21:05 ` Yury Norov
0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2025-08-11 20:50 UTC (permalink / raw)
To: Yury Norov
Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, x86, kvm, linux-kernel, Zheyun Shen
On Mon, Aug 11, 2025, Yury Norov wrote:
> From: Yury Norov (NVIDIA) <yury.norov@gmail.com>
>
> Before calling wbnoinvd_on_cpus_mask(), the function checks the cpumask
> for emptiness. It's useless, as the following wbnoinvd_on_cpus_mask()
> ends up with smp_call_function_many_cond(), which handles empty cpumask
> correctly.
I don't agree that it's useless. The early check avoids disabling/enabling
preemption (which is cheap, but still), and IMO it makes the KVM code more obviously
correct. E.g. it takes quite a bit of digging to understand that invoking
wbnoinvd_on_cpus_mask() with an empty mask is ok/fine.
I'm not completely opposed to this change, but I also don't see the point.
> While there, move function-wide comment on top of the function.
>
> Fixes: 6f38f8c57464 ("KVM: SVM: Flush cache only on CPUs running SEV guest")
> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
> ---
> arch/x86/kvm/svm/sev.c | 11 ++++-------
> 1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 2fbdebf79fbb..49d7557de8bc 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -716,15 +716,12 @@ static void sev_clflush_pages(struct page *pages[], unsigned long npages)
> }
> }
>
> +/*
> + * The caller is responsible for ensuring correctness if the mask
> + * can be modified, e.g. if a CPU could be doing VMRUN.
> + */
> static void sev_writeback_caches(struct kvm *kvm)
> {
> - /*
> - * Note, the caller is responsible for ensuring correctness if the mask
> - * can be modified, e.g. if a CPU could be doing VMRUN.
> - */
> - if (cpumask_empty(to_kvm_sev_info(kvm)->have_run_cpus))
> - return;
> -
> /*
> * Ensure that all dirty guest tagged cache entries are written back
> * before releasing the pages back to the system for use. CLFLUSH will
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] KVM: SVM: don't check have_run_cpus in sev_writeback_caches()
2025-08-11 20:50 ` Sean Christopherson
@ 2025-08-11 21:05 ` Yury Norov
2025-08-11 21:21 ` Sean Christopherson
0 siblings, 1 reply; 12+ messages in thread
From: Yury Norov @ 2025-08-11 21:05 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, x86, kvm, linux-kernel, Zheyun Shen
On Mon, Aug 11, 2025 at 01:50:15PM -0700, Sean Christopherson wrote:
> On Mon, Aug 11, 2025, Yury Norov wrote:
> > From: Yury Norov (NVIDIA) <yury.norov@gmail.com>
> >
> > Before calling wbnoinvd_on_cpus_mask(), the function checks the cpumask
> > for emptiness. It's useless, as the following wbnoinvd_on_cpus_mask()
> > ends up with smp_call_function_many_cond(), which handles empty cpumask
> > correctly.
>
> I don't agree that it's useless. The early check avoids disabling/enabling
> preemption (which is cheap, but still), and IMO it makes the KVM code more obviously
> correct. E.g. it takes quite a bit of digging to understand that invoking
> wbnoinvd_on_cpus_mask() with an empty mask is ok/fine.
>
> I'm not completely opposed to this change, but I also don't see the point.
So, there's a tradeoff between useless preemption cycling, which is
O(1) and cpumask_empty(), which is O(N).
I have no measurements that can support one vs another. But the
original patch doesn't discuss it anyhow, as well. So, with the
lack of any information on performance impact, I'd stick with the
version that brings less code.
Agree?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] KVM: SVM: don't check have_run_cpus in sev_writeback_caches()
2025-08-11 21:05 ` Yury Norov
@ 2025-08-11 21:21 ` Sean Christopherson
2025-08-11 21:31 ` Yury Norov
0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2025-08-11 21:21 UTC (permalink / raw)
To: Yury Norov
Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, x86, kvm, linux-kernel, Zheyun Shen
On Mon, Aug 11, 2025, Yury Norov wrote:
> On Mon, Aug 11, 2025 at 01:50:15PM -0700, Sean Christopherson wrote:
> > On Mon, Aug 11, 2025, Yury Norov wrote:
> > > From: Yury Norov (NVIDIA) <yury.norov@gmail.com>
> > >
> > > Before calling wbnoinvd_on_cpus_mask(), the function checks the cpumask
> > > for emptiness. It's useless, as the following wbnoinvd_on_cpus_mask()
> > > ends up with smp_call_function_many_cond(), which handles empty cpumask
> > > correctly.
> >
> > I don't agree that it's useless. The early check avoids disabling/enabling
> > preemption (which is cheap, but still), and IMO it makes the KVM code more obviously
> > correct. E.g. it takes quite a bit of digging to understand that invoking
> > wbnoinvd_on_cpus_mask() with an empty mask is ok/fine.
> >
> > I'm not completely opposed to this change, but I also don't see the point.
>
> So, there's a tradeoff between useless preemption cycling, which is
> O(1) and cpumask_empty(), which is O(N).
Oh, that argument I buy. I had it in my head that the check is going to be O(1)
in practice, because never running vCPU0 would be all kinds of bizarre. But the
mask tracks physical CPUs, not virtual CPUs. E.g. a 2-vCPU VM that's pinned to
the last 2 pCPUs in the system could indeed trigger several superfluous loads and
checks.
> I have no measurements that can support one vs another. But the
> original patch doesn't discuss it anyhow, as well. So, with the
> lack of any information on performance impact, I'd stick with the
> version that brings less code.
>
> Agree?
Not sure I agree that less code is always better, but I do agree that dropping
the check makes sense. :-)
How about this? No need for a v2 (unless you disagree on the tweaks), I'll happily
fixup when applying.
--
From: Yury Norov <yury.norov@gmail.com>
Date: Mon, 11 Aug 2025 16:30:39 -0400
Subject: [PATCH] KVM: SEV: don't check have_run_cpus in sev_writeback_caches()
Drop KVM's check on an empty cpumask when flushing caches when memory is
being reclaimed from an SEV VM, as smp_call_function_many_cond() naturally
(and correctly) handles and empty cpumask. This avoids an extra O(n)
lookup in the common case where at least one pCPU has enterred the guest,
which could be noticeable in some setups, e.g. if a small VM is pinned to
the last few pCPUs in the system.
Fixes: 6f38f8c57464 ("KVM: SVM: Flush cache only on CPUs running SEV guest")
Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
[sean: rewrite changelog to capture performance angle]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/svm/sev.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2fbdebf79fbb..0635bd71c10e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -718,13 +718,6 @@ static void sev_clflush_pages(struct page *pages[], unsigned long npages)
static void sev_writeback_caches(struct kvm *kvm)
{
- /*
- * Note, the caller is responsible for ensuring correctness if the mask
- * can be modified, e.g. if a CPU could be doing VMRUN.
- */
- if (cpumask_empty(to_kvm_sev_info(kvm)->have_run_cpus))
- return;
-
/*
* Ensure that all dirty guest tagged cache entries are written back
* before releasing the pages back to the system for use. CLFLUSH will
@@ -739,6 +732,9 @@ static void sev_writeback_caches(struct kvm *kvm)
* serializing multiple calls and having responding CPUs (to the IPI)
* mark themselves as still running if they are running (or about to
* run) a vCPU for the VM.
+ *
+ * Note, the caller is responsible for ensuring correctness if the mask
+ * can be modified, e.g. if a CPU could be doing VMRUN.
*/
wbnoinvd_on_cpus_mask(to_kvm_sev_info(kvm)->have_run_cpus);
}
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
--
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: drop useless cpumask_test_cpu() in pre_sev_run()
2025-08-11 20:45 ` Sean Christopherson
@ 2025-08-11 21:28 ` Yury Norov
2025-08-11 22:04 ` Sean Christopherson
0 siblings, 1 reply; 12+ messages in thread
From: Yury Norov @ 2025-08-11 21:28 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, x86, kvm, linux-kernel, Zheyun Shen
On Mon, Aug 11, 2025 at 01:45:46PM -0700, Sean Christopherson wrote:
> On Mon, Aug 11, 2025, Yury Norov wrote:
> > Testing cpumask for a CPU to be cleared just before setting the exact
> > same CPU is useless because the end result is always the same: CPU is
> > set.
>
> No, it is not useless. Blindly writing to the variable will unnecessarily bounce
> the cacheline, and this is a hot path.
How hot is that path? How bad the cache contention is? Is there any evidence
that conditional cpumask_set_cpu() worth the effort? The original patch
doesn't discuss that at all, and without any comment the code looks just
buggy.
> > While there, switch CPU setter to a non-atomic version. Atomicity is
> > useless here
>
> No, atomicity isn't useless here either. Dropping atomicity could result in
> CPU's bit being lost. I.e. the atomic accesses aren't for the benefit of
> smp_call_function_many_cond(), the writes are atomic so that multiple vCPUs can
> concurrently update the mask without needing additional protection.
OK, I see. Something heavy hit my head before I decided to drop
atomicity there.
> > because sev_writeback_caches() ends up with a plain
> > for_each_cpu() loop in smp_call_function_many_cond(), which is not
> > atomic by nature.
>
> That's fine. As noted in sev_writeback_caches(), if vCPU could be running, then
> the caller is responsible for ensuring that all vCPUs flush caches before the
> memory being reclaimed is fully freed. Those guarantees are provided by KVM's
> MMU.
>
> sev_writeback_caches() => smp_call_function_many_cond() could hit false positives,
> i.e. trigger WBINVD on CPUs that couldn't possibly have accessed the memory being
> reclaimed, but such false positives are functionally benign, and are "intended"
> in the sense that we chose to prioritize simplicity over precision.
So, I don't object to drop the patch, but it would be really nice to
have this
if (!cpumask_test_cpu())
cpumask_set_cpu()
pattern explained, and even better supported with performance numbers.
Thanks,
Yury
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] KVM: SVM: don't check have_run_cpus in sev_writeback_caches()
2025-08-11 21:21 ` Sean Christopherson
@ 2025-08-11 21:31 ` Yury Norov
0 siblings, 0 replies; 12+ messages in thread
From: Yury Norov @ 2025-08-11 21:31 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, x86, kvm, linux-kernel, Zheyun Shen
On Mon, Aug 11, 2025 at 02:21:44PM -0700, Sean Christopherson wrote:
> On Mon, Aug 11, 2025, Yury Norov wrote:
> > On Mon, Aug 11, 2025 at 01:50:15PM -0700, Sean Christopherson wrote:
> > > On Mon, Aug 11, 2025, Yury Norov wrote:
> > > > From: Yury Norov (NVIDIA) <yury.norov@gmail.com>
> > > >
> > > > Before calling wbnoinvd_on_cpus_mask(), the function checks the cpumask
> > > > for emptiness. It's useless, as the following wbnoinvd_on_cpus_mask()
> > > > ends up with smp_call_function_many_cond(), which handles empty cpumask
> > > > correctly.
> > >
> > > I don't agree that it's useless. The early check avoids disabling/enabling
> > > preemption (which is cheap, but still), and IMO it makes the KVM code more obviously
> > > correct. E.g. it takes quite a bit of digging to understand that invoking
> > > wbnoinvd_on_cpus_mask() with an empty mask is ok/fine.
> > >
> > > I'm not completely opposed to this change, but I also don't see the point.
> >
> > So, there's a tradeoff between useless preemption cycling, which is
> > O(1) and cpumask_empty(), which is O(N).
>
> Oh, that argument I buy. I had it in my head that the check is going to be O(1)
> in practice, because never running vCPU0 would be all kinds of bizarre. But the
> mask tracks physical CPUs, not virtual CPUs. E.g. a 2-vCPU VM that's pinned to
> the last 2 pCPUs in the system could indeed trigger several superfluous loads and
> checks.
>
> > I have no measurements that can support one vs another. But the
> > original patch doesn't discuss it anyhow, as well. So, with the
> > lack of any information on performance impact, I'd stick with the
> > version that brings less code.
> >
> > Agree?
>
> Not sure I agree that less code is always better, but I do agree that dropping
> the check makes sense. :-)
>
> How about this? No need for a v2 (unless you disagree on the tweaks), I'll happily
> fixup when applying.
Sure deal! Thanks.
> --
> From: Yury Norov <yury.norov@gmail.com>
> Date: Mon, 11 Aug 2025 16:30:39 -0400
> Subject: [PATCH] KVM: SEV: don't check have_run_cpus in sev_writeback_caches()
>
> Drop KVM's check on an empty cpumask when flushing caches when memory is
> being reclaimed from an SEV VM, as smp_call_function_many_cond() naturally
> (and correctly) handles and empty cpumask. This avoids an extra O(n)
> lookup in the common case where at least one pCPU has enterred the guest,
> which could be noticeable in some setups, e.g. if a small VM is pinned to
> the last few pCPUs in the system.
>
> Fixes: 6f38f8c57464 ("KVM: SVM: Flush cache only on CPUs running SEV guest")
> Signed-off-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
> [sean: rewrite changelog to capture performance angle]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/svm/sev.c | 10 +++-------
> 1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 2fbdebf79fbb..0635bd71c10e 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -718,13 +718,6 @@ static void sev_clflush_pages(struct page *pages[], unsigned long npages)
>
> static void sev_writeback_caches(struct kvm *kvm)
> {
> - /*
> - * Note, the caller is responsible for ensuring correctness if the mask
> - * can be modified, e.g. if a CPU could be doing VMRUN.
> - */
> - if (cpumask_empty(to_kvm_sev_info(kvm)->have_run_cpus))
> - return;
> -
> /*
> * Ensure that all dirty guest tagged cache entries are written back
> * before releasing the pages back to the system for use. CLFLUSH will
> @@ -739,6 +732,9 @@ static void sev_writeback_caches(struct kvm *kvm)
> * serializing multiple calls and having responding CPUs (to the IPI)
> * mark themselves as still running if they are running (or about to
> * run) a vCPU for the VM.
> + *
> + * Note, the caller is responsible for ensuring correctness if the mask
> + * can be modified, e.g. if a CPU could be doing VMRUN.
> */
> wbnoinvd_on_cpus_mask(to_kvm_sev_info(kvm)->have_run_cpus);
> }
>
> base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
> --
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: drop useless cpumask_test_cpu() in pre_sev_run()
2025-08-11 21:28 ` Yury Norov
@ 2025-08-11 22:04 ` Sean Christopherson
2025-08-14 0:42 ` Yury Norov
0 siblings, 1 reply; 12+ messages in thread
From: Sean Christopherson @ 2025-08-11 22:04 UTC (permalink / raw)
To: Yury Norov
Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, x86, kvm, linux-kernel, Zheyun Shen
On Mon, Aug 11, 2025, Yury Norov wrote:
> On Mon, Aug 11, 2025 at 01:45:46PM -0700, Sean Christopherson wrote:
> > On Mon, Aug 11, 2025, Yury Norov wrote:
> > > Testing cpumask for a CPU to be cleared just before setting the exact
> > > same CPU is useless because the end result is always the same: CPU is
> > > set.
> >
> > No, it is not useless. Blindly writing to the variable will unnecessarily bounce
> > the cacheline, and this is a hot path.
>
> How hot is that path?
Very, it gets hit on every VM-Exit => VM-Entry. For context, putting a single
printk anywhere in KVM's exit=>entry path can completely prevent forward progress
in the guest (for some workloads/patterns).
> How bad the cache contention is?
I would expect it to be "fatally" bad for some workloads and setups. Not literally
fatal, but bad enough that it would require an urgent fix.
> Is there any evidence that conditional cpumask_set_cpu() worth the effort?
I don't have evidence for this specific code flow, but there is plenty of evidence
that shows that generating atomic accesses, especially across sockets, can have a
significant negative impact on performance.
I didn't ask for performance numbers for optimizing setting the mask because (a)
I know the VM-Entry path can be extremely hot, (b) I know that dueling atomics
can be hugely problematic, and (c) I don't see the separate test + set logic as
being at all notable in terms of effort.
> The original patch doesn't discuss that at all, and without any comment the
> code looks just buggy.
FWIW, there was discussion in a previous version of the series, but no hard
numbers on the perf impact.
https://lore.kernel.org/all/Z75se_OZQvaeQE-4@google.com
>
> > > While there, switch CPU setter to a non-atomic version. Atomicity is
> > > useless here
> >
> > No, atomicity isn't useless here either. Dropping atomicity could result in
> > CPU's bit being lost. I.e. the atomic accesses aren't for the benefit of
> > smp_call_function_many_cond(), the writes are atomic so that multiple vCPUs can
> > concurrently update the mask without needing additional protection.
>
> OK, I see. Something heavy hit my head before I decided to drop
> atomicity there.
>
> > > because sev_writeback_caches() ends up with a plain
> > > for_each_cpu() loop in smp_call_function_many_cond(), which is not
> > > atomic by nature.
> >
> > That's fine. As noted in sev_writeback_caches(), if vCPU could be running, then
> > the caller is responsible for ensuring that all vCPUs flush caches before the
> > memory being reclaimed is fully freed. Those guarantees are provided by KVM's
> > MMU.
> >
> > sev_writeback_caches() => smp_call_function_many_cond() could hit false positives,
> > i.e. trigger WBINVD on CPUs that couldn't possibly have accessed the memory being
> > reclaimed, but such false positives are functionally benign, and are "intended"
> > in the sense that we chose to prioritize simplicity over precision.
>
> So, I don't object to drop the patch, but it would be really nice to
> have this
> if (!cpumask_test_cpu())
> cpumask_set_cpu()
>
> pattern explained, and even better supported with performance numbers.
I can definitely add a comment, and I might try to gather numbers out of curiosity,
but as above, I just don't see this as something that needs to be investigated with
any urgency.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] KVM: SVM: drop useless cpumask_test_cpu() in pre_sev_run()
2025-08-11 22:04 ` Sean Christopherson
@ 2025-08-14 0:42 ` Yury Norov
0 siblings, 0 replies; 12+ messages in thread
From: Yury Norov @ 2025-08-14 0:42 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, H. Peter Anvin, x86, kvm, linux-kernel, Zheyun Shen
On Mon, Aug 11, 2025 at 03:04:50PM -0700, Sean Christopherson wrote:
> On Mon, Aug 11, 2025, Yury Norov wrote:
> > On Mon, Aug 11, 2025 at 01:45:46PM -0700, Sean Christopherson wrote:
> > > On Mon, Aug 11, 2025, Yury Norov wrote:
> > > > Testing cpumask for a CPU to be cleared just before setting the exact
> > > > same CPU is useless because the end result is always the same: CPU is
> > > > set.
> > >
> > > No, it is not useless. Blindly writing to the variable will unnecessarily bounce
> > > the cacheline, and this is a hot path.
> >
> > How hot is that path?
>
> Very, it gets hit on every VM-Exit => VM-Entry. For context, putting a single
> printk anywhere in KVM's exit=>entry path can completely prevent forward progress
> in the guest (for some workloads/patterns).
>
> > How bad the cache contention is?
>
> I would expect it to be "fatally" bad for some workloads and setups. Not literally
> fatal, but bad enough that it would require an urgent fix.
>
> > Is there any evidence that conditional cpumask_set_cpu() worth the effort?
>
> I don't have evidence for this specific code flow, but there is plenty of evidence
> that shows that generating atomic accesses, especially across sockets, can have a
> significant negative impact on performance.
>
> I didn't ask for performance numbers for optimizing setting the mask because (a)
> I know the VM-Entry path can be extremely hot, (b) I know that dueling atomics
> can be hugely problematic, and (c) I don't see the separate test + set logic as
> being at all notable in terms of effort.
>
> > The original patch doesn't discuss that at all, and without any comment the
> > code looks just buggy.
>
> FWIW, there was discussion in a previous version of the series, but no hard
> numbers on the perf impact.
>
> https://lore.kernel.org/all/Z75se_OZQvaeQE-4@google.com
OK, I see. So, as I said I don't insist on moving this patch. Let's
drop it if you think that cache contention is critical.
I probably need to think in the opposite direction - if some code
pieces trash caches by concurrent accessing the whole cache line just
to set a single bit, and people try to minimize that by using
conditional set/clear_bit(), we need to make it as effective as we
can, and a part of the API.
Thanks for the discussion, Sean!
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] KVM: SVM: fixes for SEV
2025-08-11 20:30 [PATCH 0/2] KVM: SVM: fixes for SEV Yury Norov
2025-08-11 20:30 ` [PATCH 1/2] KVM: SVM: don't check have_run_cpus in sev_writeback_caches() Yury Norov
2025-08-11 20:30 ` [PATCH 2/2] KVM: SVM: drop useless cpumask_test_cpu() in pre_sev_run() Yury Norov
@ 2025-08-19 23:11 ` Sean Christopherson
2 siblings, 0 replies; 12+ messages in thread
From: Sean Christopherson @ 2025-08-19 23:11 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, H. Peter Anvin, x86, kvm,
linux-kernel, Yury Norov
Cc: Zheyun Shen
On Mon, 11 Aug 2025 16:30:38 -0400, Yury Norov wrote:
> From: Yury Norov (NVIDIA) <yury.norov@gmail.com>
>
> A couple fixes for recently merged sev_writeback_caches() and
> pre_sev_run() changes.
>
> Yury Norov (NVIDIA) (2):
> KVM: SVM: don't check have_run_cpus in sev_writeback_caches()
> KVM: SVM: drop useless cpumask_test_cpu() in pre_sev_run()
>
> [...]
Applied patch 1 to kvm-x86 fixes, with the reworded changelog to call out that
it's a performance fix. Thanks!
[1/2] KVM: SVM: don't check have_run_cpus in sev_writeback_caches()
https://github.com/kvm-x86/linux/commit/923fcb3dbc02
--
https://github.com/kvm-x86/linux/tree/next
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-08-19 23:13 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-11 20:30 [PATCH 0/2] KVM: SVM: fixes for SEV Yury Norov
2025-08-11 20:30 ` [PATCH 1/2] KVM: SVM: don't check have_run_cpus in sev_writeback_caches() Yury Norov
2025-08-11 20:50 ` Sean Christopherson
2025-08-11 21:05 ` Yury Norov
2025-08-11 21:21 ` Sean Christopherson
2025-08-11 21:31 ` Yury Norov
2025-08-11 20:30 ` [PATCH 2/2] KVM: SVM: drop useless cpumask_test_cpu() in pre_sev_run() Yury Norov
2025-08-11 20:45 ` Sean Christopherson
2025-08-11 21:28 ` Yury Norov
2025-08-11 22:04 ` Sean Christopherson
2025-08-14 0:42 ` Yury Norov
2025-08-19 23:11 ` [PATCH 0/2] KVM: SVM: fixes for SEV Sean Christopherson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).