All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	 Michael Larabel <michael@michaellarabel.com>,
	kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	 linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org,
	linuxppc-dev@lists.ozlabs.org, x86@kernel.org,
	 linux-mm@google.com
Subject: Re: [PATCH mm-unstable v1 5/5] mm: multi-gen LRU: use mmu_notifier_test_clear_young()
Date: Thu, 23 Feb 2023 12:28:58 -0800	[thread overview]
Message-ID: <Y/fMimvChfhhbCid@google.com> (raw)
In-Reply-To: <CAOUHufYWktz4SNjL_o_2oZNcJLXserwCot-Prv4UEG9uzn57rg@mail.gmail.com>

On Thu, Feb 23, 2023, Yu Zhao wrote:
> On Thu, Feb 23, 2023 at 12:58 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Feb 23, 2023, Yu Zhao wrote:
> > > On Thu, Feb 23, 2023 at 12:11 PM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Thu, Feb 23, 2023, Yu Zhao wrote:
> > > > > > As alluded to in patch 1, unless batching the walks even if KVM does _not_ support
> > > > > > a lockless walk is somehow _worse_ than using the existing mmu_notifier_clear_flush_young(),
> > > > > > I think batching the calls should be conditional only on LRU_GEN_SPTE_WALK.  Or
> > > > > > if we want to avoid batching when there are no mmu_notifier listeners, probe
> > > > > > mmu_notifiers.  But don't call into KVM directly.
> > > > >
> > > > > I'm not sure I fully understand. Let's present the problem on the MM
> > > > > side: assuming KVM supports lockless walks, batching can still be
> > > > > worse (very unlikely), because GFNs can exhibit no memory locality at
> > > > > all. So this option allows userspace to disable batching.
> > > >
> > > > I'm asking the opposite.  Is there a scenario where batching+lock is worse than
> > > > !batching+lock?  If not, then don't make batching depend on lockless walks.
> > >
> > > Yes, absolutely. batching+lock means we take/release mmu_lock for
> > > every single PTE in the entire VA space -- each small batch contains
> > > 64 PTEs but the entire batch is the whole KVM.
> >
> > Who is "we"?
> 
> Oops -- shouldn't have used "we".
> 
> > I don't see anything in the kernel that triggers walking the whole
> > VMA, e.g. lru_gen_look_around() limits the walk to a single PMD.  I feel like I'm
> > missing something...
> 
> walk_mm() -> walk_pud_range() -> walk_pmd_range() -> walk_pte_range()
> -> test_spte_young() -> mmu_notifier_test_clear_young().
> 
> MGLRU takes two passes: during the first pass, it sweeps entire VA
> space on each MM (per MM/KVM); during the second pass, it uses the rmap on each
> folio (per folio).

Ah.  IIUC, userspace can use LRU_GEN_SPTE_WALK to control whether or not to walk
secondary MMUs, and the kernel further restricts LRU_GEN_SPTE_WALK to secondary
MMUs that implement a lockless walk.  And if the answer is "no", secondary MMUs
are simply not consulted.

If that's correct, then the proper way to handle this is by extending mmu_notifier_ops
to query (a) if there's at least one register listeners that implements
test_clear_young() and (b) if all registered listeners that implement test_clear_young()
support lockless walks.  That avoids direct dependencies on KVM, and avoids making
assumptions that may not always hold true, e.g. that KVM is the only mmu_notifier
user that supports the young APIs.

P.S. all of this info absolutely belongs in documentation and/or changelogs.

WARNING: multiple messages have this Message-ID (diff)
From: Sean Christopherson <seanjc@google.com>
To: Yu Zhao <yuzhao@google.com>
Cc: linux-mm@google.com, kvm@vger.kernel.org,
	Jonathan Corbet <corbet@lwn.net>,
	Michael Larabel <michael@michaellarabel.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	kvmarm@lists.linux.dev, Paolo Bonzini <pbonzini@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH mm-unstable v1 5/5] mm: multi-gen LRU: use mmu_notifier_test_clear_young()
Date: Thu, 23 Feb 2023 12:28:58 -0800	[thread overview]
Message-ID: <Y/fMimvChfhhbCid@google.com> (raw)
In-Reply-To: <CAOUHufYWktz4SNjL_o_2oZNcJLXserwCot-Prv4UEG9uzn57rg@mail.gmail.com>

On Thu, Feb 23, 2023, Yu Zhao wrote:
> On Thu, Feb 23, 2023 at 12:58 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Feb 23, 2023, Yu Zhao wrote:
> > > On Thu, Feb 23, 2023 at 12:11 PM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Thu, Feb 23, 2023, Yu Zhao wrote:
> > > > > > As alluded to in patch 1, unless batching the walks even if KVM does _not_ support
> > > > > > a lockless walk is somehow _worse_ than using the existing mmu_notifier_clear_flush_young(),
> > > > > > I think batching the calls should be conditional only on LRU_GEN_SPTE_WALK.  Or
> > > > > > if we want to avoid batching when there are no mmu_notifier listeners, probe
> > > > > > mmu_notifiers.  But don't call into KVM directly.
> > > > >
> > > > > I'm not sure I fully understand. Let's present the problem on the MM
> > > > > side: assuming KVM supports lockless walks, batching can still be
> > > > > worse (very unlikely), because GFNs can exhibit no memory locality at
> > > > > all. So this option allows userspace to disable batching.
> > > >
> > > > I'm asking the opposite.  Is there a scenario where batching+lock is worse than
> > > > !batching+lock?  If not, then don't make batching depend on lockless walks.
> > >
> > > Yes, absolutely. batching+lock means we take/release mmu_lock for
> > > every single PTE in the entire VA space -- each small batch contains
> > > 64 PTEs but the entire batch is the whole KVM.
> >
> > Who is "we"?
> 
> Oops -- shouldn't have used "we".
> 
> > I don't see anything in the kernel that triggers walking the whole
> > VMA, e.g. lru_gen_look_around() limits the walk to a single PMD.  I feel like I'm
> > missing something...
> 
> walk_mm() -> walk_pud_range() -> walk_pmd_range() -> walk_pte_range()
> -> test_spte_young() -> mmu_notifier_test_clear_young().
> 
> MGLRU takes two passes: during the first pass, it sweeps entire VA
> space on each MM (per MM/KVM); during the second pass, it uses the rmap on each
> folio (per folio).

Ah.  IIUC, userspace can use LRU_GEN_SPTE_WALK to control whether or not to walk
secondary MMUs, and the kernel further restricts LRU_GEN_SPTE_WALK to secondary
MMUs that implement a lockless walk.  And if the answer is "no", secondary MMUs
are simply not consulted.

If that's correct, then the proper way to handle this is by extending mmu_notifier_ops
to query (a) if there's at least one register listeners that implements
test_clear_young() and (b) if all registered listeners that implement test_clear_young()
support lockless walks.  That avoids direct dependencies on KVM, and avoids making
assumptions that may not always hold true, e.g. that KVM is the only mmu_notifier
user that supports the young APIs.

P.S. all of this info absolutely belongs in documentation and/or changelogs.

WARNING: multiple messages have this Message-ID (diff)
From: Sean Christopherson <seanjc@google.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	 Michael Larabel <michael@michaellarabel.com>,
	kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	 linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org,
	linuxppc-dev@lists.ozlabs.org, x86@kernel.org,
	 linux-mm@google.com
Subject: Re: [PATCH mm-unstable v1 5/5] mm: multi-gen LRU: use mmu_notifier_test_clear_young()
Date: Thu, 23 Feb 2023 12:28:58 -0800	[thread overview]
Message-ID: <Y/fMimvChfhhbCid@google.com> (raw)
In-Reply-To: <CAOUHufYWktz4SNjL_o_2oZNcJLXserwCot-Prv4UEG9uzn57rg@mail.gmail.com>

On Thu, Feb 23, 2023, Yu Zhao wrote:
> On Thu, Feb 23, 2023 at 12:58 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Feb 23, 2023, Yu Zhao wrote:
> > > On Thu, Feb 23, 2023 at 12:11 PM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Thu, Feb 23, 2023, Yu Zhao wrote:
> > > > > > As alluded to in patch 1, unless batching the walks even if KVM does _not_ support
> > > > > > a lockless walk is somehow _worse_ than using the existing mmu_notifier_clear_flush_young(),
> > > > > > I think batching the calls should be conditional only on LRU_GEN_SPTE_WALK.  Or
> > > > > > if we want to avoid batching when there are no mmu_notifier listeners, probe
> > > > > > mmu_notifiers.  But don't call into KVM directly.
> > > > >
> > > > > I'm not sure I fully understand. Let's present the problem on the MM
> > > > > side: assuming KVM supports lockless walks, batching can still be
> > > > > worse (very unlikely), because GFNs can exhibit no memory locality at
> > > > > all. So this option allows userspace to disable batching.
> > > >
> > > > I'm asking the opposite.  Is there a scenario where batching+lock is worse than
> > > > !batching+lock?  If not, then don't make batching depend on lockless walks.
> > >
> > > Yes, absolutely. batching+lock means we take/release mmu_lock for
> > > every single PTE in the entire VA space -- each small batch contains
> > > 64 PTEs but the entire batch is the whole KVM.
> >
> > Who is "we"?
> 
> Oops -- shouldn't have used "we".
> 
> > I don't see anything in the kernel that triggers walking the whole
> > VMA, e.g. lru_gen_look_around() limits the walk to a single PMD.  I feel like I'm
> > missing something...
> 
> walk_mm() -> walk_pud_range() -> walk_pmd_range() -> walk_pte_range()
> -> test_spte_young() -> mmu_notifier_test_clear_young().
> 
> MGLRU takes two passes: during the first pass, it sweeps entire VA
> space on each MM (per MM/KVM); during the second pass, it uses the rmap on each
> folio (per folio).

Ah.  IIUC, userspace can use LRU_GEN_SPTE_WALK to control whether or not to walk
secondary MMUs, and the kernel further restricts LRU_GEN_SPTE_WALK to secondary
MMUs that implement a lockless walk.  And if the answer is "no", secondary MMUs
are simply not consulted.

If that's correct, then the proper way to handle this is by extending mmu_notifier_ops
to query (a) if there's at least one register listeners that implements
test_clear_young() and (b) if all registered listeners that implement test_clear_young()
support lockless walks.  That avoids direct dependencies on KVM, and avoids making
assumptions that may not always hold true, e.g. that KVM is the only mmu_notifier
user that supports the young APIs.

P.S. all of this info absolutely belongs in documentation and/or changelogs.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2023-02-23 20:29 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-17  4:12 [PATCH mm-unstable v1 0/5] mm/kvm: lockless accessed bit harvest Yu Zhao
2023-02-17  4:12 ` Yu Zhao
2023-02-17  4:12 ` Yu Zhao
2023-02-17  4:12 ` [PATCH mm-unstable v1 1/5] mm/kvm: add mmu_notifier_test_clear_young() Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-23 17:13   ` Sean Christopherson
2023-02-23 17:13     ` Sean Christopherson
2023-02-23 17:13     ` Sean Christopherson
2023-02-23 17:40     ` Yu Zhao
2023-02-23 17:40       ` Yu Zhao
2023-02-23 17:40       ` Yu Zhao
2023-02-23 21:12       ` Sean Christopherson
2023-02-23 21:12         ` Sean Christopherson
2023-02-23 21:12         ` Sean Christopherson
2023-02-23 17:34   ` Sean Christopherson
2023-02-23 17:34     ` Sean Christopherson
2023-02-23 17:34     ` Sean Christopherson
2023-02-17  4:12 ` [PATCH mm-unstable v1 2/5] kvm/x86: add kvm_arch_test_clear_young() Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:19   ` Yu Zhao
2023-02-17  4:19     ` Yu Zhao
2023-02-17  4:19     ` Yu Zhao
2023-02-17 16:27   ` Sean Christopherson
2023-02-17 16:27     ` Sean Christopherson
2023-02-17 16:27     ` Sean Christopherson
2023-02-23  5:58     ` Yu Zhao
2023-02-23  5:58       ` Yu Zhao
2023-02-23  5:58       ` Yu Zhao
2023-02-23 17:09       ` Sean Christopherson
2023-02-23 17:09         ` Sean Christopherson
2023-02-23 17:09         ` Sean Christopherson
2023-02-23 17:27         ` Yu Zhao
2023-02-23 17:27           ` Yu Zhao
2023-02-23 17:27           ` Yu Zhao
2023-02-23 18:23           ` Sean Christopherson
2023-02-23 18:23             ` Sean Christopherson
2023-02-23 18:23             ` Sean Christopherson
2023-02-23 18:34             ` Yu Zhao
2023-02-23 18:34               ` Yu Zhao
2023-02-23 18:34               ` Yu Zhao
2023-02-23 18:47               ` Sean Christopherson
2023-02-23 18:47                 ` Sean Christopherson
2023-02-23 18:47                 ` Sean Christopherson
2023-02-23 19:02                 ` Yu Zhao
2023-02-23 19:02                   ` Yu Zhao
2023-02-23 19:02                   ` Yu Zhao
2023-02-23 19:21                   ` Sean Christopherson
2023-02-23 19:21                     ` Sean Christopherson
2023-02-23 19:21                     ` Sean Christopherson
2023-02-23 19:25                     ` Yu Zhao
2023-02-23 19:25                       ` Yu Zhao
2023-02-23 19:25                       ` Yu Zhao
2023-02-17  4:12 ` [PATCH mm-unstable v1 3/5] kvm/arm64: " Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:21   ` Yu Zhao
2023-02-17  4:21     ` Yu Zhao
2023-02-17  4:21     ` Yu Zhao
2023-02-17  9:00     ` Marc Zyngier
2023-02-17  9:00       ` Marc Zyngier
2023-02-17  9:00       ` Marc Zyngier
2023-02-23  3:58       ` Yu Zhao
2023-02-23  3:58         ` Yu Zhao
2023-02-23  3:58         ` Yu Zhao
2023-02-23  9:03         ` Marc Zyngier
2023-02-23  9:03           ` Marc Zyngier
2023-02-23  9:03           ` Marc Zyngier
2023-02-23  9:18           ` Yu Zhao
2023-02-23  9:18             ` Yu Zhao
2023-02-23  9:18             ` Yu Zhao
2023-02-17  9:09   ` Oliver Upton
2023-02-17  9:09     ` Oliver Upton
2023-02-17  9:09     ` Oliver Upton
2023-02-17 16:00     ` Sean Christopherson
2023-02-17 16:00       ` Sean Christopherson
2023-02-17 16:00       ` Sean Christopherson
2023-02-23  5:25       ` Yu Zhao
2023-02-23  5:25         ` Yu Zhao
2023-02-23  5:25         ` Yu Zhao
2023-02-23  4:43     ` Yu Zhao
2023-02-23  4:43       ` Yu Zhao
2023-02-23  4:43       ` Yu Zhao
2023-02-17  4:12 ` [PATCH mm-unstable v1 4/5] kvm/powerpc: " Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:24   ` Yu Zhao
2023-02-17  4:24     ` Yu Zhao
2023-02-17  4:24     ` Yu Zhao
2023-02-17  4:12 ` [PATCH mm-unstable v1 5/5] mm: multi-gen LRU: use mmu_notifier_test_clear_young() Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-23 17:43   ` Sean Christopherson
2023-02-23 17:43     ` Sean Christopherson
2023-02-23 17:43     ` Sean Christopherson
2023-02-23 18:08     ` Yu Zhao
2023-02-23 18:08       ` Yu Zhao
2023-02-23 18:08       ` Yu Zhao
2023-02-23 19:11       ` Sean Christopherson
2023-02-23 19:11         ` Sean Christopherson
2023-02-23 19:11         ` Sean Christopherson
2023-02-23 19:36         ` Yu Zhao
2023-02-23 19:36           ` Yu Zhao
2023-02-23 19:36           ` Yu Zhao
2023-02-23 19:58           ` Sean Christopherson
2023-02-23 19:58             ` Sean Christopherson
2023-02-23 19:58             ` Sean Christopherson
2023-02-23 20:09             ` Yu Zhao
2023-02-23 20:09               ` Yu Zhao
2023-02-23 20:09               ` Yu Zhao
2023-02-23 20:28               ` Sean Christopherson [this message]
2023-02-23 20:28                 ` Sean Christopherson
2023-02-23 20:28                 ` Sean Christopherson
2023-02-23 20:48                 ` Yu Zhao
2023-02-23 20:48                   ` Yu Zhao
2023-02-23 20:48                   ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y/fMimvChfhhbCid@google.com \
    --to=seanjc@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=hannes@cmpxchg.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@google.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=michael@michaellarabel.com \
    --cc=pbonzini@redhat.com \
    --cc=x86@kernel.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.