Re: [PATCH v9 00/11] KVM: x86/mmu: Age sptes locklessly

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: James Houghton <jthoughton@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	 David Matlack <dmatlack@google.com>,
	David Rientjes <rientjes@google.com>,
	Marc Zyngier <maz@kernel.org>,
	 Oliver Upton <oliver.upton@linux.dev>,
	Wei Xu <weixugc@google.com>, Yu Zhao <yuzhao@google.com>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	kvm@vger.kernel.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH v9 00/11] KVM: x86/mmu: Age sptes locklessly
Date: Tue, 25 Feb 2025 16:50:56 -0800	[thread overview]
Message-ID: <Z75lcJOEFfBMATAf@google.com> (raw)
In-Reply-To: <07788b85473e24627131ffe1a8d1d01856dd9cb5.camel@redhat.com>

On Tue, Feb 25, 2025, Maxim Levitsky wrote:
> On Tue, 2025-02-18 at 17:13 -0800, Sean Christopherson wrote:
> > My understanding is that the behavior is deliberate.  Per Yu[1], page_idle/bitmap
> > effectively isn't supported by MGLRU.
> > 
> > [1] https://lore.kernel.org/all/CAOUHufZeADNp_y=Ng+acmMMgnTR=ZGFZ7z-m6O47O=CmJauWjw@mail.gmail.com
> 
> Hi,
> 
> Reading this mail makes me think that the page idle interface isn't really
> used anymore.

I'm sure it's still used in production somewhere.  And even if it's being phased
out in favor of MGLRU, it's still super useful for testing purposes, because it
gives userspace much more direct control over aging.

> Maybe we should redo the access_tracking_perf_test to only use the MGLRU
> specific interfaces/mode, and remove its classical page_idle mode altogher?

I don't want to take a hard dependency on MGLRU (unless page_idle gets fully
deprecated/removed by the kernel), and I also don't think page_idle is the main
problem with the test.
   
> The point I am trying to get across is that currently
> access_tracking_perf_test main purpose is to test that page_idle works with
> secondary paging and the fact is that it doesn't work well due to more that
> one reason:

The primary purpose of the test is to measure performance.  Asserting that 90%+
pages were dirtied is a sanity check, not an outright goal.

> The mere fact that we don't flush TLB already necessitated hacks like the 90%
> check, which for example doesn't work nested so another hack was needed, to
> skip the check completely when hypervisor is detected, etc, etc.

100% agreed here.

> And now as of 6.13, we don't propagate accessed bit when KVM zaps the SPTE at
> all, which can happen at least in theory due to other reasons than NUMA balancing.
> 
> Tomorrow there will be something else that will cause KVM to zap the SPTEs,
> and the test will fail again, and again...
> 
> What do you think?

What if we make the assertion user controllable?  I.e. let the user opt-out (or
off-by-default and opt-in) via command line?  We did something similar for the
rseq test, because the test would run far fewer iterations than expected if the
vCPU task was migrated to CPU(s) in deep sleep states.

	TEST_ASSERT(skip_sanity_check || i > (NR_TASK_MIGRATIONS / 2),
		    "Only performed %d KVM_RUNs, task stalled too much?\n\n"
		    "  Try disabling deep sleep states to reduce CPU wakeup latency,\n"
		    "  e.g. via cpuidle.off=1 or setting /dev/cpu_dma_latency to '0',\n"
		    "  or run with -u to disable this sanity check.", i);

This is quite similar, because as you say, it's impractical for the test to account
for every possible environmental quirk.

> > Aha!  I wonder if in the failing case, the vCPU gets migrated to a pCPU on a
> > different node, and that causes NUMA balancing to go crazy and zap pretty much
> > all of guest memory.  If that's what's happening, then a better solution for the
> > NUMA balancing issue would be to affine the vCPU to a single NUMA node (or hard
> > pin it to a single pCPU?).
> 
> Nope. I pinned main thread to  CPU 0 and VM thread to  CPU 1 and the problem
> persists.  On 6.13, the only way to make the test consistently work is to
> disable NUMA balancing.

Well that's odd.  While I'm quite curious as to what's happening, my stance is
that enabling NUMA balancing with KVM is a terrible idea, so my vote is to sweep
it under the rug and let the user disable the sanity check.

next prev parent reply	other threads:[~2025-02-26  0:50 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-04  0:40 [PATCH v9 00/11] KVM: x86/mmu: Age sptes locklessly James Houghton
2025-02-04  0:40 ` [PATCH v9 01/11] KVM: Rename kvm_handle_hva_range() James Houghton
2025-02-04  0:40 ` [PATCH v9 02/11] KVM: Add lockless memslot walk to KVM James Houghton
2025-02-14 15:26   ` Sean Christopherson
2025-02-14 19:27     ` James Houghton
2025-02-04  0:40 ` [PATCH v9 03/11] KVM: x86/mmu: Factor out spte atomic bit clearing routine James Houghton
2025-02-04  0:40 ` [PATCH v9 04/11] KVM: x86/mmu: Relax locking for kvm_test_age_gfn() and kvm_age_gfn() James Houghton
2025-02-12 22:07   ` Sean Christopherson
2025-02-13  0:25     ` James Houghton
2025-02-04  0:40 ` [PATCH v9 05/11] KVM: x86/mmu: Rename spte_has_volatile_bits() to spte_needs_atomic_write() James Houghton
2025-02-12 22:09   ` Sean Christopherson
2025-02-13  0:26     ` James Houghton
2025-02-04  0:40 ` [PATCH v9 06/11] KVM: x86/mmu: Skip shadow MMU test_young if TDP MMU reports page as young James Houghton
2025-02-04  0:40 ` [PATCH v9 07/11] KVM: x86/mmu: Only check gfn age in shadow MMU if indirect_shadow_pages > 0 James Houghton
2025-02-04  0:40 ` [PATCH v9 08/11] KVM: x86/mmu: Refactor low level rmap helpers to prep for walking w/o mmu_lock James Houghton
2025-02-04  0:40 ` [PATCH v9 09/11] KVM: x86/mmu: Add infrastructure to allow walking rmaps outside of mmu_lock James Houghton
2025-02-04  0:40 ` [PATCH v9 10/11] KVM: x86/mmu: Add support for lockless walks of rmap SPTEs James Houghton
2025-02-04  0:40 ` [PATCH v9 11/11] KVM: x86/mmu: Support rmap walks without holding mmu_lock when aging gfns James Houghton
2025-02-15  0:50 ` [PATCH v9 00/11] KVM: x86/mmu: Age sptes locklessly Sean Christopherson
2025-02-18 19:29 ` Maxim Levitsky
2025-02-19  1:13   ` Sean Christopherson
2025-02-19 18:56     ` James Houghton
2025-02-25 22:00     ` Maxim Levitsky
2025-02-26  0:50       ` Sean Christopherson [this message]
2025-02-26 18:39         ` Maxim Levitsky
2025-02-27  0:51           ` Sean Christopherson
2025-02-27  1:54             ` Maxim Levitsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z75lcJOEFfBMATAf@google.com \
    --to=seanjc@google.com \
    --cc=axelrasmussen@google.com \
    --cc=dmatlack@google.com \
    --cc=jthoughton@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=rientjes@google.com \
    --cc=weixugc@google.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox