From: Sean Christopherson <seanjc@google.com>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
Henry Huang <henry.hj@antgroup.com>,
linux-mm@kvack.org
Subject: Re: access_tracking_perf_test kvm selftest doesn't work when Multi-Gen LRU is in use
Date: Tue, 21 May 2024 16:29:54 -0700 [thread overview]
Message-ID: <Zk0uckIeAsb5ex4i@google.com> (raw)
In-Reply-To: <7a46456d6750ea682ba321ad09541fa81677b81a.camel@redhat.com>
On Wed, May 15, 2024, Maxim Levitsky wrote:
> Small note on why we started seeing this failure on RHEL 9 and only on some machines:
>
> - RHEL9 has MGLRU enabled, RHEL8 doesn't.
For a stopgap in KVM selftests, or possibly even a long term solution in case the
decision is that page_idle will simply have different behavior for MGLRU, couldn't
we tweak the test to not assert if MGRLU is enabled?
E.g. refactor get_module_param_integer() and/or get_module_param() to add
get_sysfs_value_integer() or so, and then do this?
diff --git a/tools/testing/selftests/kvm/access_tracking_perf_test.c b/tools/testing/selftests/kvm/access_tracking_perf_test.c
index 3c7defd34f56..1e759df36098 100644
--- a/tools/testing/selftests/kvm/access_tracking_perf_test.c
+++ b/tools/testing/selftests/kvm/access_tracking_perf_test.c
@@ -123,6 +123,11 @@ static void mark_page_idle(int page_idle_fd, uint64_t pfn)
"Set page_idle bits for PFN 0x%" PRIx64, pfn);
}
+static bool is_lru_gen_enabled(void)
+{
+ return !!get_sysfs_value_integer("/sys/kernel/mm/lru_gen/enabled");
+}
+
static void mark_vcpu_memory_idle(struct kvm_vm *vm,
struct memstress_vcpu_args *vcpu_args)
{
@@ -185,7 +190,8 @@ static void mark_vcpu_memory_idle(struct kvm_vm *vm,
*/
if (still_idle >= pages / 10) {
#ifdef __x86_64__
- TEST_ASSERT(this_cpu_has(X86_FEATURE_HYPERVISOR),
+ TEST_ASSERT(this_cpu_has(X86_FEATURE_HYPERVISOR) ||
+ is_lru_gen_enabled(),
"vCPU%d: Too many pages still idle (%lu out of %lu)",
vcpu_idx, still_idle, pages);
#endif
> - machine needs to have more than one NUMA node because NUMA balancing
> (enabled by default) tries apparently to write protect the primary PTEs
> of (all?) processes every few seconds, and that causes KVM to flush the secondary PTEs:
> (at least with new tdp mmu)
>
> access_tracking-3448 [091] ....1.. 1380.244666: handle_changed_spte <-tdp_mmu_set_spte
> access_tracking-3448 [091] ....1.. 1380.244667: <stack trace>
> => cdc_driver_init
> => handle_changed_spte
> => tdp_mmu_set_spte
> => tdp_mmu_zap_leafs
> => kvm_tdp_mmu_unmap_gfn_range
> => kvm_unmap_gfn_range
> => kvm_mmu_notifier_invalidate_range_start
> => __mmu_notifier_invalidate_range_start
> => change_p4d_range
> => change_protection
> => change_prot_numa
> => task_numa_work
> => task_work_run
> => exit_to_user_mode_prepare
> => syscall_exit_to_user_mode
> => do_syscall_64
> => entry_SYSCALL_64_after_hwframe
>
> It's a separate question, if the NUMA balancing should do this, or if NUMA
> balancing should be enabled by default,
FWIW, IMO, enabling NUMA balancing on a system whose primary purpose is to run VMs
is bad idea. NUMA balancing operates under the assumption that a !PRESENT #PF is
relatively cheap. When secondary MMUs are involved, that is simply not the case,
e.g. to honor the mmu_notifer event, KVM zaps _and_ does a remote TLB flush. Even
if we reworked KVM and/or the mmu_notifiers so that KVM didn't need to do such a
heavy operation, the cost of page fault VM-Exit is significantly higher than the
cost of a host #PF.
> because there are other reasons that can force KVM to invalidate the
> secondary mappings and trigger this issue.
Ya.
prev parent reply other threads:[~2024-05-21 23:29 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-15 23:39 access_tracking_perf_test kvm selftest doesn't work when Multi-Gen LRU is in use Maxim Levitsky
2024-05-21 23:29 ` Sean Christopherson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zk0uckIeAsb5ex4i@google.com \
--to=seanjc@google.com \
--cc=henry.hj@antgroup.com \
--cc=kvm@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mlevitsk@redhat.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.