Re: [PATCH v5 4/9] mm: Add test_clear_young_fast_only MMU notifier

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: James Houghton <jthoughton@google.com>
Cc: Yu Zhao <yuzhao@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	Ankit Agrawal <ankita@nvidia.com>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	 David Matlack <dmatlack@google.com>,
	David Rientjes <rientjes@google.com>,
	 James Morse <james.morse@arm.com>,
	Jonathan Corbet <corbet@lwn.net>, Marc Zyngier <maz@kernel.org>,
	 Oliver Upton <oliver.upton@linux.dev>,
	Raghavendra Rao Ananta <rananta@google.com>,
	 Ryan Roberts <ryan.roberts@arm.com>,
	Shaoqin Huang <shahuang@redhat.com>,
	 Suzuki K Poulose <suzuki.poulose@arm.com>,
	Wei Xu <weixugc@google.com>,  Will Deacon <will@kernel.org>,
	Zenghui Yu <yuzenghui@huawei.com>,
	kvmarm@lists.linux.dev,  kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,  linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v5 4/9] mm: Add test_clear_young_fast_only MMU notifier
Date: Tue, 11 Jun 2024 17:34:24 -0700	[thread overview]
Message-ID: <ZmjtEBH42u7NUWRc@google.com> (raw)
In-Reply-To: <CADrL8HU_FKHTz_6d=xhVLZFDQ_zQo-zdB2rqdpa2CKusa1uo+A@mail.gmail.com>

On Tue, Jun 11, 2024, James Houghton wrote:
> On Tue, Jun 11, 2024 at 12:42 PM Sean Christopherson <seanjc@google.com> wrote:
> > --
> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > index 7b77ad6cf833..07872ae00fa6 100644
> > --- a/mm/mmu_notifier.c
> > +++ b/mm/mmu_notifier.c
> > @@ -384,7 +384,8 @@ int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
> >
> >  int __mmu_notifier_clear_young(struct mm_struct *mm,
> >                                unsigned long start,
> > -                              unsigned long end)
> > +                              unsigned long end,
> > +                              bool fast_only)
> >  {
> >         struct mmu_notifier *subscription;
> >         int young = 0, id;
> > @@ -393,9 +394,12 @@ int __mmu_notifier_clear_young(struct mm_struct *mm,
> >         hlist_for_each_entry_rcu(subscription,
> >                                  &mm->notifier_subscriptions->list, hlist,
> >                                  srcu_read_lock_held(&srcu)) {
> > -               if (subscription->ops->clear_young)
> > -                       young |= subscription->ops->clear_young(subscription,
> > -                                                               mm, start, end);
> > +               if (!subscription->ops->clear_young ||
> > +                   fast_only && !subscription->ops->has_fast_aging)
> > +                       continue;
> > +
> > +               young |= subscription->ops->clear_young(subscription,
> > +                                                       mm, start, end);
> 
> KVM changing has_fast_aging dynamically would be slow, wouldn't it?

No, it could/would be done quite quickly.  But, I'm not suggesting has_fast_aging
be dynamic, i.e. it's not an "all aging is guaranteed to be fast", it's a "this
MMU _can_ do fast aging".  It's a bit fuzzy/weird mostly because KVM can essentially
have multiple secondary MMUs wired up to the same mmu_notifier.

> I feel like it's simpler to just pass in fast_only into `clear_young` itself
> (and this is how I interpreted what you wrote above anyway).

Eh, maybe?  A "has_fast_aging" flag is more robust in the sense that it requires
secondary MMUs to opt-in, i.e. all secondary MMUs will be considered "slow" by
default.
 
It's somewhat of a moot point because KVM is the only secondary MMU that implements
.clear_young() and .test_young() (which I keep forgetting), and that seems unlikely
to change.

A flag would also avoid an indirect call and thus a RETPOLINE when CONFIG_RETPOLINE=y,
i.e. would be a minor optimization when KVM doesn't suppport fast aging.  But that's
probably a pretty unlikely combination, so it's probably not a valid argument.

So, I guess I don't have a strong opinion?

> > Double ugh.  Peeking ahead at the "failure" code, NAK to adding
> > kvm_arch_young_notifier_likely_fast for all the same reasons I objected to
> > kvm_arch_has_test_clear_young() in v1.  Please stop trying to do anything like
> > that, I will NAK each every attempt to have core mm/ code call directly into KVM.
> 
> Sorry to make you repeat yourself; I'll leave it out of v6. I don't
> like it either, but I wasn't sure how important it was to avoid
> calling into unnecessary notifiers if the TDP MMU were completely
> disabled.

If it's important, e.g. for performance, then the mmu_notifier should have a flag
so that the behavior doesn't assume a KVM backend.   Hence my has_fast_aging
suggestion.

> > Anyways, back to this code, before we spin another version, we need to agree on
> > exactly what behavior we want out of secondary MMUs.  Because to me, the behavior
> > proposed in this version doesn't make any sense.
> >
> > Signalling failure because KVM _might_ have relevant aging information in SPTEs
> > that require taking kvm->mmu_lock is a terrible tradeoff.  And for the test_young
> > case, it's flat out wrong, e.g. if a page is marked Accessed in the TDP MMU, then
> > KVM should return "young", not "failed".
> 
> Sorry for this oversight. What about something like:
> 
> 1. test (and maybe clear) A bits on TDP MMU
> 2. If accessed && !should_clear: return (fast)
> 3. if (fast_only): return (fast)
> 4. If !(must check shadow MMU): return (fast)
> 5. test (and maybe clear) A bits in shadow MMU
> 6. return (slow)

I don't understand where the "must check shadow MMU" in #4 comes from.  I also
don't think it's necessary; see below.
 
> Some of this reordering (and maybe a change from
> kvm_shadow_root_allocated() to checking indirect_shadow_pages or
> something else) can be done in its own patch.
>
> > So rather than failing the fast aging, I think what we want is to know if an
> > mmu_notifier found a young SPTE during a fast lookup.  E.g. something like this
> > in KVM, where using kvm_has_shadow_mmu_sptes() instead of kvm_memslots_have_rmaps()
> > is an optional optimization to avoid taking mmu_lock for write in paths where a
> > (very rare) false negative is acceptable.
> >
> >   static bool kvm_has_shadow_mmu_sptes(struct kvm *kvm)
> >   {
> >         return !tdp_mmu_enabled || READ_ONCE(kvm->arch.indirect_shadow_pages);
> >   }
> >
> >   static int __kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range,
> >                          bool fast_only)
> >   {
> >         int young = 0;
> >
> >         if (!fast_only && kvm_has_shadow_mmu_sptes(kvm)) {
> >                 write_lock(&kvm->mmu_lock);
> >                 young = kvm_handle_gfn_range(kvm, range, kvm_age_rmap);
> >                 write_unlock(&kvm->mmu_lock);
> >         }
> >
> >         if (tdp_mmu_enabled && kvm_tdp_mmu_age_gfn_range(kvm, range))
> >                 young = 1 | MMU_NOTIFY_WAS_FAST;
> 
> I don't think this line is quite right. We might set
> MMU_NOTIFY_WAS_FAST even when we took the mmu_lock. I understand what
> you mean though, thanks.

The name sucks, but I believe the logic is correct.  As posted here in v5, the
MGRLU code wants to age both fast _and_ slow MMUs.  AIUI, the intent is to always
get aging information, but only look around at other PTEs if it can be done fast.

	if (should_walk_secondary_mmu()) {
		notifier_result =
			mmu_notifier_test_clear_young_fast_only(
					vma->vm_mm, addr, addr + PAGE_SIZE,
					/*clear=*/true);
	}

	if (notifier_result & MMU_NOTIFIER_FAST_FAILED)
		secondary_young = mmu_notifier_clear_young(vma->vm_mm, addr,
							   addr + PAGE_SIZE);
	else {
		secondary_young = notifier_result & MMU_NOTIFIER_FAST_YOUNG;
		notifier_was_fast = true;
	}

The change, relative to v5, that I am proposing is that MGLRU looks around if
the page was young in _a_ "fast" secondary MMU, whereas v5 looks around if and
only if _all_ secondary MMUs are fast.

In other words, if a fast MMU had a young SPTE, look around _that_ MMU, via the
fast_only flag.

next prev parent reply	other threads:[~2024-06-12  0:34 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-11  0:21 [PATCH v5 0/9] mm: multi-gen LRU: Walk secondary MMU page tables while aging James Houghton
2024-06-11  0:21 ` James Houghton
2024-06-11  0:21 ` [PATCH v5 1/9] KVM: Add lockless memslot walk to KVM James Houghton
2024-06-11  0:21   ` James Houghton
2024-06-11  0:21 ` [PATCH v5 2/9] KVM: x86: Relax locking for kvm_test_age_gfn and kvm_age_gfn James Houghton
2024-06-11  0:21   ` James Houghton
2024-06-11  0:21 ` [PATCH v5 3/9] KVM: arm64: " James Houghton
2024-06-11  0:21   ` James Houghton
2024-06-11  5:57   ` Oliver Upton
2024-06-11  5:57     ` Oliver Upton
2024-06-11 16:52     ` James Houghton
2024-06-11 16:52       ` James Houghton
2024-06-11  0:21 ` [PATCH v5 4/9] mm: Add test_clear_young_fast_only MMU notifier James Houghton
2024-06-11  0:21   ` James Houghton
2024-06-11  5:33   ` Yu Zhao
2024-06-11  5:33     ` Yu Zhao
2024-06-11 16:49     ` James Houghton
2024-06-11 16:49       ` James Houghton
2024-06-11 18:54       ` Oliver Upton
2024-06-11 19:49         ` Sean Christopherson
2024-06-13  6:52           ` Oliver Upton
2024-06-14  0:48             ` James Houghton
2024-06-11 19:42       ` Sean Christopherson
2024-06-11 23:04         ` James Houghton
2024-06-12  0:34           ` Sean Christopherson [this message]
2024-06-14  0:45             ` James Houghton
2024-06-14 16:12               ` Sean Christopherson
2024-06-14 18:23                 ` James Houghton
2024-06-14 23:17                   ` Sean Christopherson
2024-06-17 16:50                     ` James Houghton
2024-06-17 18:37                       ` Sean Christopherson
2024-06-28 23:38                         ` James Houghton
2024-07-08 16:50                           ` James Houghton
2024-07-09 17:49                             ` Sean Christopherson
2024-07-10 23:10                               ` James Houghton
2024-07-12 15:06                                 ` Sean Christopherson
2024-07-15 23:15                                   ` James Houghton
2024-06-11 20:39       ` Yu Zhao
2024-06-11  0:21 ` [PATCH v5 5/9] KVM: Add kvm_fast_age_gfn and kvm_fast_test_age_gfn James Houghton
2024-06-11  0:21   ` James Houghton
2024-06-11  0:21 ` [PATCH v5 6/9] KVM: x86: Move tdp_mmu_enabled and shadow_accessed_mask James Houghton
2024-06-11  0:21   ` James Houghton
2024-06-11  0:21 ` [PATCH v5 7/9] KVM: x86: Implement kvm_fast_test_age_gfn and kvm_fast_age_gfn James Houghton
2024-06-11  0:21   ` James Houghton
2024-06-11  0:21 ` [PATCH v5 8/9] mm: multi-gen LRU: Have secondary MMUs participate in aging James Houghton
2024-06-11  0:21   ` James Houghton
2024-06-12 16:02   ` Sean Christopherson
2024-06-12 16:59     ` Yu Zhao
2024-06-12 17:23       ` Sean Christopherson
2024-06-13  6:49         ` Oliver Upton
2024-07-05 18:35   ` Yu Zhao
2024-07-08 17:30     ` James Houghton
2024-07-08 23:41       ` Yu Zhao
2024-07-22 20:45         ` James Houghton
2024-07-22 21:23           ` Yu Zhao
2024-06-11  0:21 ` [PATCH v5 9/9] KVM: selftests: Add multi-gen LRU aging to access_tracking_perf_test James Houghton
2024-06-11  0:21   ` James Houghton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZmjtEBH42u7NUWRc@google.com \
    --to=seanjc@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=ankita@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=corbet@lwn.net \
    --cc=dmatlack@google.com \
    --cc=james.morse@arm.com \
    --cc=jthoughton@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=pbonzini@redhat.com \
    --cc=rananta@google.com \
    --cc=rientjes@google.com \
    --cc=ryan.roberts@arm.com \
    --cc=shahuang@redhat.com \
    --cc=suzuki.poulose@arm.com \
    --cc=weixugc@google.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.