* [PATCH] KVM: selftests: Add a requirement for disabling numa balancing
@ 2024-01-17 6:44 Tao Su
2024-01-17 15:12 ` Sean Christopherson
0 siblings, 1 reply; 5+ messages in thread
From: Tao Su @ 2024-01-17 6:44 UTC (permalink / raw)
To: kvm; +Cc: seanjc, pbonzini, shuah, yi1.lai, tao1.su
In dirty_log_page_splitting_test, vm_get_stat(vm, "pages_4k") has
probability of gradually reducing to 0 after vm exit. The reason is that
the host triggers numa balancing and unmaps the related spte. So, the
number of pages currently mapped in EPT (kvm->stat.pages) is not equal
to the pages touched by the guest, which causes stats_populated.pages_4k
and stats_repopulated.pages_4k in this test are not same, resulting in
failure.
The calltrace of unmapping spte triggered by numa balancing:
handle_changed_spte+0x64b/0x830 [kvm]
tdp_mmu_zap_leafs+0x159/0x290 [kvm]
kvm_tdp_mmu_unmap_gfn_range+0x7b/0xa0 [kvm]
kvm_unmap_gfn_range+0x10f/0x130 [kvm]
? _raw_spin_unlock+0x1d/0x40
? hugetlb_follow_page_mask+0x1ba/0x400
? preempt_count_add+0x86/0xd0
kvm_mmu_notifier_invalidate_range_start+0x14d/0x380 [kvm]
__mmu_notifier_invalidate_range_start+0x89/0x1f0
change_protection+0xce1/0x1490
? __pfx_tlb_is_not_lazy+0x10/0x10
change_prot_numa+0x5d/0xb0
? kmalloc_trace+0x2e/0xa0
task_numa_work+0x364/0x550
task_work_run+0x62/0xa0
xfer_to_guest_mode_handle_work+0xc3/0xd0
kvm_arch_vcpu_ioctl_run+0xe8e/0x1b90 [kvm]
kvm_vcpu_ioctl+0x282/0x710 [kvm]
dirty_log_page_splitting_test assumes that kvm->stat.pages and the pages
touched by the guest are the same, but the assumption is no longer true
if numa balancing is enabled. Add a requirement for disabling
numa_balancing to avoid confusing due to test failure.
Actually, all page migration (including numa balancing) will trigger this
issue, e.g. running script:
./x86_64/dirty_log_page_splitting_test &
PID=$!
sleep 1
migratepages $PID 0 1
It is unusual to create above test environment intentionally, but numa
balancing initiated by the kernel will most likely be triggered, at
least in dirty_log_page_splitting_test.
Reported-by: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
---
.../kvm/x86_64/dirty_log_page_splitting_test.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
index 634c6bfcd572..f2c796111d83 100644
--- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
+++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
@@ -212,10 +212,21 @@ static void help(char *name)
int main(int argc, char *argv[])
{
+ FILE *f;
int opt;
+ int ret, numa_balancing;
TEST_REQUIRE(get_kvm_param_bool("eager_page_split"));
TEST_REQUIRE(get_kvm_param_bool("tdp_mmu"));
+ f = fopen("/proc/sys/kernel/numa_balancing", "r");
+ if (f) {
+ ret = fscanf(f, "%d", &numa_balancing);
+ TEST_ASSERT(ret == 1, "Error reading numa_balancing");
+ TEST_ASSERT(!numa_balancing, "please run "
+ "'echo 0 > /proc/sys/kernel/numa_balancing'");
+ fclose(f);
+ f = NULL;
+ }
while ((opt = getopt(argc, argv, "b:hs:")) != -1) {
switch (opt) {
base-commit: 052d534373b7ed33712a63d5e17b2b6cdbce84fd
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] KVM: selftests: Add a requirement for disabling numa balancing
2024-01-17 6:44 [PATCH] KVM: selftests: Add a requirement for disabling numa balancing Tao Su
@ 2024-01-17 15:12 ` Sean Christopherson
2024-01-18 5:43 ` Tao Su
0 siblings, 1 reply; 5+ messages in thread
From: Sean Christopherson @ 2024-01-17 15:12 UTC (permalink / raw)
To: Tao Su; +Cc: kvm, pbonzini, shuah, yi1.lai
On Wed, Jan 17, 2024, Tao Su wrote:
> In dirty_log_page_splitting_test, vm_get_stat(vm, "pages_4k") has
> probability of gradually reducing to 0 after vm exit. The reason is that
> the host triggers numa balancing and unmaps the related spte. So, the
> number of pages currently mapped in EPT (kvm->stat.pages) is not equal
> to the pages touched by the guest, which causes stats_populated.pages_4k
> and stats_repopulated.pages_4k in this test are not same, resulting in
> failure.
...
> dirty_log_page_splitting_test assumes that kvm->stat.pages and the pages
> touched by the guest are the same, but the assumption is no longer true
> if numa balancing is enabled. Add a requirement for disabling
> numa_balancing to avoid confusing due to test failure.
>
> Actually, all page migration (including numa balancing) will trigger this
> issue, e.g. running script:
> ./x86_64/dirty_log_page_splitting_test &
> PID=$!
> sleep 1
> migratepages $PID 0 1
> It is unusual to create above test environment intentionally, but numa
> balancing initiated by the kernel will most likely be triggered, at
> least in dirty_log_page_splitting_test.
>
> Reported-by: Yi Lai <yi1.lai@intel.com>
> Signed-off-by: Tao Su <tao1.su@linux.intel.com>
> Tested-by: Yi Lai <yi1.lai@intel.com>
> ---
> .../kvm/x86_64/dirty_log_page_splitting_test.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> index 634c6bfcd572..f2c796111d83 100644
> --- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> @@ -212,10 +212,21 @@ static void help(char *name)
>
> int main(int argc, char *argv[])
> {
> + FILE *f;
> int opt;
> + int ret, numa_balancing;
>
> TEST_REQUIRE(get_kvm_param_bool("eager_page_split"));
> TEST_REQUIRE(get_kvm_param_bool("tdp_mmu"));
> + f = fopen("/proc/sys/kernel/numa_balancing", "r");
> + if (f) {
> + ret = fscanf(f, "%d", &numa_balancing);
> + TEST_ASSERT(ret == 1, "Error reading numa_balancing");
> + TEST_ASSERT(!numa_balancing, "please run "
> + "'echo 0 > /proc/sys/kernel/numa_balancing'");
If we go this route, this should be a TEST_REQUIRE(), not a TEST_ASSERT(). The
test hasn't failed, rather it has detected an incompatible setup.
Something isn't right though. The test defaults to HugeTLB, and the invocation
in the changelog doesn't override the backing source. That suggests that NUMA
auto-balancing is zapping HugeTLB VMAs, which AFAIK shouldn't happen, e.g. this
code in task_numa_work() should cause such VMAs to be skipped:
if (!vma_migratable(vma) || !vma_policy_mof(vma) ||
is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) {
trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_UNSUITABLE);
continue;
}
And the test already warns the user if they opt to use something other than
HugeTLB.
if (!is_backing_src_hugetlb(backing_src)) {
pr_info("This test will only work reliably with HugeTLB memory. "
"It can work with THP, but that is best effort.\n");
}
If the test is defaulting to something other than HugeTLB, then we should fix
that in the test. If the kernel is doing NUMA balancing on HugeTLB VMAs, then
we should fix that in the kernel.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] KVM: selftests: Add a requirement for disabling numa balancing
2024-01-17 15:12 ` Sean Christopherson
@ 2024-01-18 5:43 ` Tao Su
2024-01-18 17:33 ` Sean Christopherson
0 siblings, 1 reply; 5+ messages in thread
From: Tao Su @ 2024-01-18 5:43 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm, pbonzini, shuah, yi1.lai
On Wed, Jan 17, 2024 at 07:12:08AM -0800, Sean Christopherson wrote:
[...]
> > diff --git a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> > index 634c6bfcd572..f2c796111d83 100644
> > --- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> > +++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> > @@ -212,10 +212,21 @@ static void help(char *name)
> >
> > int main(int argc, char *argv[])
> > {
> > + FILE *f;
> > int opt;
> > + int ret, numa_balancing;
> >
> > TEST_REQUIRE(get_kvm_param_bool("eager_page_split"));
> > TEST_REQUIRE(get_kvm_param_bool("tdp_mmu"));
> > + f = fopen("/proc/sys/kernel/numa_balancing", "r");
> > + if (f) {
> > + ret = fscanf(f, "%d", &numa_balancing);
> > + TEST_ASSERT(ret == 1, "Error reading numa_balancing");
> > + TEST_ASSERT(!numa_balancing, "please run "
> > + "'echo 0 > /proc/sys/kernel/numa_balancing'");
>
> If we go this route, this should be a TEST_REQUIRE(), not a TEST_ASSERT(). The
> test hasn't failed, rather it has detected an incompatible setup.
Yes, previously I wanted to print a more user-friendly prompt, but TEST_REQUIRE()
can’t customize the output…
>
> Something isn't right though. The test defaults to HugeTLB, and the invocation
> in the changelog doesn't override the backing source. That suggests that NUMA
> auto-balancing is zapping HugeTLB VMAs, which AFAIK shouldn't happen, e.g. this
> code in task_numa_work() should cause such VMAs to be skipped:
>
> if (!vma_migratable(vma) || !vma_policy_mof(vma) ||
> is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) {
> trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_UNSUITABLE);
> continue;
> }
>
> And the test already warns the user if they opt to use something other than
> HugeTLB.
>
> if (!is_backing_src_hugetlb(backing_src)) {
> pr_info("This test will only work reliably with HugeTLB memory. "
> "It can work with THP, but that is best effort.\n");
> }
>
> If the test is defaulting to something other than HugeTLB, then we should fix
> that in the test. If the kernel is doing NUMA balancing on HugeTLB VMAs, then
> we should fix that in the kernel.
HugeTLB VMAs are not affected by NUMA auto-balancing through my observation, but
the backing sources of the test code and per-vCPU stacks are not Huge TLB, e.g.
__vm_create() invokes
vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, 0);
So, some pages are possible to be migrated.
In dirty_log_page_splitting_test, if sleep 3s before enabling dirty logging:
--- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
+++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
@@ -124,6 +124,7 @@ static void run_test(enum vm_guest_mode mode, void *unused)
memstress_start_vcpu_threads(VCPUS, vcpu_worker);
run_vcpu_iteration(vm);
+ sleep(3);
get_page_stats(vm, &stats_populated, "populating memory");
/* Enable dirty logging */
I got these logs:
# ./x86_64/dirty_log_page_splitting_test
Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages
__vm_create: mode='PA-bits:ANY, VA-bits:48, 4K pages' pages='2710'
Guest physical address width detected: 52
guest physical test memory: [0xfffff7fe00000, 0xfffffffe00000)
Added VCPU 0 with test mem gpa [fffff7fe00000, fffffffe00000)
Added VCPU 1 with test mem gpa [fffff7fe00000, fffffffe00000)
Page stats after populating memory: 4K: 0 2M: 1024 1G: 0 huge: 1024
Page stats after enabling dirty logging: 4K: 524288 2M: 0 1G: 0 huge: 0
Page stats after dirtying memory: 4K: 525334 2M: 0 1G: 0 huge: 0
Page stats after dirtying memory: 4K: 525334 2M: 0 1G: 0 huge: 0
Page stats after disabling dirty logging: 4K: 1046 2M: 0 1G: 0 huge: 0
Page stats after repopulating memory: 4K: 1046 2M: 1024 1G: 0 huge: 1024
==== Test Assertion Failure ====
x86_64/dirty_log_page_splitting_test.c:196: stats_populated.pages_4k == stats_repopulated.pages_4k
pid=2660413 tid=2660413 errno=0 - Success
1 0x0000000000402d4b: run_test at dirty_log_page_splitting_test.c:196 (discriminator 1)
2 0x0000000000403724: for_each_guest_mode at guest_modes.c:100
3 0x00000000004024ef: main at dirty_log_page_splitting_test.c:246
4 0x00007fd72c229d8f: ?? ??:0
5 0x00007fd72c229e3f: ?? ??:0
6 0x00000000004025b4: _start at ??:?
0 != 0x416 (stats_populated.pages_4k != stats_repopulated.pages_4k)
Thanks,
Tao
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] KVM: selftests: Add a requirement for disabling numa balancing
2024-01-18 5:43 ` Tao Su
@ 2024-01-18 17:33 ` Sean Christopherson
2024-01-19 6:05 ` Tao Su
0 siblings, 1 reply; 5+ messages in thread
From: Sean Christopherson @ 2024-01-18 17:33 UTC (permalink / raw)
To: Tao Su; +Cc: kvm, pbonzini, shuah, yi1.lai, David Matlack
+David
On Thu, Jan 18, 2024, Tao Su wrote:
> On Wed, Jan 17, 2024 at 07:12:08AM -0800, Sean Christopherson wrote:
>
> [...]
>
> > > diff --git a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> > > index 634c6bfcd572..f2c796111d83 100644
> > > --- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> > > +++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> > > @@ -212,10 +212,21 @@ static void help(char *name)
> > >
> > > int main(int argc, char *argv[])
> > > {
> > > + FILE *f;
> > > int opt;
> > > + int ret, numa_balancing;
> > >
> > > TEST_REQUIRE(get_kvm_param_bool("eager_page_split"));
> > > TEST_REQUIRE(get_kvm_param_bool("tdp_mmu"));
> > > + f = fopen("/proc/sys/kernel/numa_balancing", "r");
> > > + if (f) {
> > > + ret = fscanf(f, "%d", &numa_balancing);
> > > + TEST_ASSERT(ret == 1, "Error reading numa_balancing");
> > > + TEST_ASSERT(!numa_balancing, "please run "
> > > + "'echo 0 > /proc/sys/kernel/numa_balancing'");
> >
> > If we go this route, this should be a TEST_REQUIRE(), not a TEST_ASSERT(). The
> > test hasn't failed, rather it has detected an incompatible setup.
>
> Yes, previously I wanted to print a more user-friendly prompt, but TEST_REQUIRE()
> can’t customize the output…
__TEST_REQUIRE()
> > Something isn't right though. The test defaults to HugeTLB, and the invocation
> > in the changelog doesn't override the backing source. That suggests that NUMA
> > auto-balancing is zapping HugeTLB VMAs, which AFAIK shouldn't happen, e.g. this
> > code in task_numa_work() should cause such VMAs to be skipped:
> >
> > if (!vma_migratable(vma) || !vma_policy_mof(vma) ||
> > is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) {
> > trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_UNSUITABLE);
> > continue;
> > }
> >
> > And the test already warns the user if they opt to use something other than
> > HugeTLB.
> >
> > if (!is_backing_src_hugetlb(backing_src)) {
> > pr_info("This test will only work reliably with HugeTLB memory. "
> > "It can work with THP, but that is best effort.\n");
> > }
> >
> > If the test is defaulting to something other than HugeTLB, then we should fix
> > that in the test. If the kernel is doing NUMA balancing on HugeTLB VMAs, then
> > we should fix that in the kernel.
>
> HugeTLB VMAs are not affected by NUMA auto-balancing through my observation, but
> the backing sources of the test code and per-vCPU stacks are not Huge TLB, e.g.
> __vm_create() invokes
>
> vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, 0);
>
> So, some pages are possible to be migrated.
Ah, hmm. Requiring NUMA balancing be disabled isn't going to fix the underlying
issue, it's just guarding against one of the more likely culprits. The best fix
is likely to have the test precisely validate _only_ the test data pages. E.g.
if we double down on requiring HugeTLB, then the test should be able to assert
that it has at least N hugepages when dirty logging is disabled, and at least M
4KiB pages when dirty logging is enabled.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] KVM: selftests: Add a requirement for disabling numa balancing
2024-01-18 17:33 ` Sean Christopherson
@ 2024-01-19 6:05 ` Tao Su
0 siblings, 0 replies; 5+ messages in thread
From: Tao Su @ 2024-01-19 6:05 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm, pbonzini, shuah, yi1.lai, David Matlack
On Thu, Jan 18, 2024 at 09:33:29AM -0800, Sean Christopherson wrote:
> +David
>
> On Thu, Jan 18, 2024, Tao Su wrote:
> > On Wed, Jan 17, 2024 at 07:12:08AM -0800, Sean Christopherson wrote:
> >
> > [...]
> >
> > > > diff --git a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> > > > index 634c6bfcd572..f2c796111d83 100644
> > > > --- a/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> > > > +++ b/tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
> > > > @@ -212,10 +212,21 @@ static void help(char *name)
> > > >
> > > > int main(int argc, char *argv[])
> > > > {
> > > > + FILE *f;
> > > > int opt;
> > > > + int ret, numa_balancing;
> > > >
> > > > TEST_REQUIRE(get_kvm_param_bool("eager_page_split"));
> > > > TEST_REQUIRE(get_kvm_param_bool("tdp_mmu"));
> > > > + f = fopen("/proc/sys/kernel/numa_balancing", "r");
> > > > + if (f) {
> > > > + ret = fscanf(f, "%d", &numa_balancing);
> > > > + TEST_ASSERT(ret == 1, "Error reading numa_balancing");
> > > > + TEST_ASSERT(!numa_balancing, "please run "
> > > > + "'echo 0 > /proc/sys/kernel/numa_balancing'");
> > >
> > > If we go this route, this should be a TEST_REQUIRE(), not a TEST_ASSERT(). The
> > > test hasn't failed, rather it has detected an incompatible setup.
> >
> > Yes, previously I wanted to print a more user-friendly prompt, but TEST_REQUIRE()
> > can’t customize the output…
>
> __TEST_REQUIRE()
Got it.
>
> > > Something isn't right though. The test defaults to HugeTLB, and the invocation
> > > in the changelog doesn't override the backing source. That suggests that NUMA
> > > auto-balancing is zapping HugeTLB VMAs, which AFAIK shouldn't happen, e.g. this
> > > code in task_numa_work() should cause such VMAs to be skipped:
> > >
> > > if (!vma_migratable(vma) || !vma_policy_mof(vma) ||
> > > is_vm_hugetlb_page(vma) || (vma->vm_flags & VM_MIXEDMAP)) {
> > > trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_UNSUITABLE);
> > > continue;
> > > }
> > >
> > > And the test already warns the user if they opt to use something other than
> > > HugeTLB.
> > >
> > > if (!is_backing_src_hugetlb(backing_src)) {
> > > pr_info("This test will only work reliably with HugeTLB memory. "
> > > "It can work with THP, but that is best effort.\n");
> > > }
> > >
> > > If the test is defaulting to something other than HugeTLB, then we should fix
> > > that in the test. If the kernel is doing NUMA balancing on HugeTLB VMAs, then
> > > we should fix that in the kernel.
> >
> > HugeTLB VMAs are not affected by NUMA auto-balancing through my observation, but
> > the backing sources of the test code and per-vCPU stacks are not Huge TLB, e.g.
> > __vm_create() invokes
> >
> > vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, 0, 0, nr_pages, 0);
> >
> > So, some pages are possible to be migrated.
>
> Ah, hmm. Requiring NUMA balancing be disabled isn't going to fix the underlying
> issue, it's just guarding against one of the more likely culprits. The best fix
> is likely to have the test precisely validate _only_ the test data pages. E.g.
> if we double down on requiring HugeTLB, then the test should be able to assert
> that it has at least N hugepages when dirty logging is disabled, and at least M
> 4KiB pages when dirty logging is enabled.
I see, I will update the ASSERT conditions, thanks for your guidance.
Thanks,
Tao
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-01-19 6:08 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-17 6:44 [PATCH] KVM: selftests: Add a requirement for disabling numa balancing Tao Su
2024-01-17 15:12 ` Sean Christopherson
2024-01-18 5:43 ` Tao Su
2024-01-18 17:33 ` Sean Christopherson
2024-01-19 6:05 ` Tao Su
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox