[PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: kvm-riscv@lists.infradead.org
Subject: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging
Date: Mon, 3 Jun 2024 16:03:05 -0700	[thread overview]
Message-ID: <Zl5LqcusZ88QOGQY@google.com> (raw)
In-Reply-To: <CADrL8HW44Hx_Ejx_6+FVKt1V17PdgT6rw+sNtKzumqc9UCVDfA@mail.gmail.com>

On Mon, Jun 03, 2024, James Houghton wrote:
> On Thu, May 30, 2024 at 11:06?PM Yu Zhao <yuzhao@google.com> wrote:
> > What I don't think is acceptable is simplifying those optimizations
> > out without documenting your justifications (I would even call it a
> > design change, rather than simplification, from v3 to v4).
> 
> I'll put back something similar to what you had before (like a
> test_clear_young() with a "fast" parameter instead of "bitmap"). I
> like the idea of having a new mmu notifier, like
> fast_test_clear_young(), while leaving test_young() and clear_young()
> unchanged (where "fast" means "prioritize speed over accuracy").

Those two statements are contradicting each other, aren't they?  Anyways, I vote
for a "fast only" variant, e.g. test_clear_young_fast_only() or so.  gup() has
already established that terminology in mm/, so hopefully it would be familiar
to readers.  We could pass a param, but then the MGLRU code would likely end up
doing a bunch of useless indirect calls into secondary MMUs, whereas a dedicated
hook allows implementations to nullify the pointer if the API isn't supported
for whatever reason.

And pulling in Oliver's comments about locking, I think it's important that the
mmu_notifier API express it's requirement that the operation be "fast", not that
it be lockless.  E.g. if a secondary MMU can guarantee that a lock will be
contented only in rare, slow cases, then taking a lock is a-ok.  Or a secondary
MMU could do try-lock and bail if the lock is contended.

That way KVM can honor the intent of the API with an implementation that works
best for KVM _and_ for MGRLU.  I'm sure there will be future adjustments and fixes,
but that's just more motivation for using something like "fast only" instead of
"lockless".

> > > I made this logic change as part of removing batching.
> > >
> > > I'd really appreciate guidance on what the correct thing to do is.
> > >
> > > In my mind, what would work great is: by default, do aging exactly
> > > when KVM can do it locklessly, and then have a Kconfig to always have
> > > MGLRU to do aging with KVM if a user really cares about proactive
> > > reclaim (when the feature bit is set). The selftest can check the
> > > Kconfig + feature bit to know for sure if aging will be done.
> >
> > I still don't see how that Kconfig helps. Or why the new static branch
> > isn't enough?
> 
> Without a special Kconfig, the feature bit just tells us that aging
> with KVM is possible, not that it will necessarily be done. For the
> self-test, it'd be good to know exactly when aging is being done or
> not, so having a Kconfig like LRU_GEN_ALWAYS_WALK_SECONDARY_MMU would
> help make the self-test set the right expectations for aging.
> 
> The Kconfig would also allow a user to know that, no matter what,
> we're going to get correct age data for VMs, even if, say, we're using
> the shadow MMU.

Heh, unless KVM flushes, you won't get "correct" age data.

> This is somewhat important for me/Google Cloud. Is that reasonable? Maybe
> there's a better solution.

Hmm, no?  There's no reason to use a Kconfig, e.g. if we _really_ want to prioritize
accuracy over speed, then a KVM (x86?) module param to have KVM walk nested TDP
page tables would give us what we want.

But before we do that, I think we need to perform due dilegence (or provide data)
showing that having KVM take mmu_lock for write in the "fast only" API provides
better total behavior.  I.e. that the additional accuracy is indeed worth the cost.

WARNING: multiple messages have this Message-ID (diff)

From: Sean Christopherson <seanjc@google.com>
To: James Houghton <jthoughton@google.com>
Cc: Yu Zhao <yuzhao@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	 Ankit Agrawal <ankita@nvidia.com>,
	Anup Patel <anup@brainfault.org>,
	 Atish Patra <atishp@atishpatra.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	 Bibo Mao <maobibo@loongson.cn>,
	Catalin Marinas <catalin.marinas@arm.com>,
	 David Matlack <dmatlack@google.com>,
	David Rientjes <rientjes@google.com>,
	 Huacai Chen <chenhuacai@kernel.org>,
	James Morse <james.morse@arm.com>,
	 Jonathan Corbet <corbet@lwn.net>, Marc Zyngier <maz@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	 Nicholas Piggin <npiggin@gmail.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	 Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	 Raghavendra Rao Ananta <rananta@google.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	 Shaoqin Huang <shahuang@redhat.com>,
	Shuah Khan <shuah@kernel.org>,
	 Suzuki K Poulose <suzuki.poulose@arm.com>,
	Tianrui Zhao <zhaotianrui@loongson.cn>,
	 Will Deacon <will@kernel.org>, Zenghui Yu <yuzenghui@huawei.com>,
	kvm-riscv@lists.infradead.org,  kvm@vger.kernel.org,
	kvmarm@lists.linux.dev,  linux-arm-kernel@lists.infradead.org,
	linux-doc@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org,  linux-mips@vger.kernel.org,
	linux-mm@kvack.org,  linux-riscv@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org,  loongarch@lists.linux.dev
Subject: Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging
Date: Mon, 3 Jun 2024 16:03:05 -0700	[thread overview]
Message-ID: <Zl5LqcusZ88QOGQY@google.com> (raw)
In-Reply-To: <CADrL8HW44Hx_Ejx_6+FVKt1V17PdgT6rw+sNtKzumqc9UCVDfA@mail.gmail.com>

On Mon, Jun 03, 2024, James Houghton wrote:
> On Thu, May 30, 2024 at 11:06 PM Yu Zhao <yuzhao@google.com> wrote:
> > What I don't think is acceptable is simplifying those optimizations
> > out without documenting your justifications (I would even call it a
> > design change, rather than simplification, from v3 to v4).
> 
> I'll put back something similar to what you had before (like a
> test_clear_young() with a "fast" parameter instead of "bitmap"). I
> like the idea of having a new mmu notifier, like
> fast_test_clear_young(), while leaving test_young() and clear_young()
> unchanged (where "fast" means "prioritize speed over accuracy").

Those two statements are contradicting each other, aren't they?  Anyways, I vote
for a "fast only" variant, e.g. test_clear_young_fast_only() or so.  gup() has
already established that terminology in mm/, so hopefully it would be familiar
to readers.  We could pass a param, but then the MGLRU code would likely end up
doing a bunch of useless indirect calls into secondary MMUs, whereas a dedicated
hook allows implementations to nullify the pointer if the API isn't supported
for whatever reason.

And pulling in Oliver's comments about locking, I think it's important that the
mmu_notifier API express it's requirement that the operation be "fast", not that
it be lockless.  E.g. if a secondary MMU can guarantee that a lock will be
contented only in rare, slow cases, then taking a lock is a-ok.  Or a secondary
MMU could do try-lock and bail if the lock is contended.

That way KVM can honor the intent of the API with an implementation that works
best for KVM _and_ for MGRLU.  I'm sure there will be future adjustments and fixes,
but that's just more motivation for using something like "fast only" instead of
"lockless".

> > > I made this logic change as part of removing batching.
> > >
> > > I'd really appreciate guidance on what the correct thing to do is.
> > >
> > > In my mind, what would work great is: by default, do aging exactly
> > > when KVM can do it locklessly, and then have a Kconfig to always have
> > > MGLRU to do aging with KVM if a user really cares about proactive
> > > reclaim (when the feature bit is set). The selftest can check the
> > > Kconfig + feature bit to know for sure if aging will be done.
> >
> > I still don't see how that Kconfig helps. Or why the new static branch
> > isn't enough?
> 
> Without a special Kconfig, the feature bit just tells us that aging
> with KVM is possible, not that it will necessarily be done. For the
> self-test, it'd be good to know exactly when aging is being done or
> not, so having a Kconfig like LRU_GEN_ALWAYS_WALK_SECONDARY_MMU would
> help make the self-test set the right expectations for aging.
> 
> The Kconfig would also allow a user to know that, no matter what,
> we're going to get correct age data for VMs, even if, say, we're using
> the shadow MMU.

Heh, unless KVM flushes, you won't get "correct" age data.

> This is somewhat important for me/Google Cloud. Is that reasonable? Maybe
> there's a better solution.

Hmm, no?  There's no reason to use a Kconfig, e.g. if we _really_ want to prioritize
accuracy over speed, then a KVM (x86?) module param to have KVM walk nested TDP
page tables would give us what we want.

But before we do that, I think we need to perform due dilegence (or provide data)
showing that having KVM take mmu_lock for write in the "fast only" API provides
better total behavior.  I.e. that the additional accuracy is indeed worth the cost.

WARNING: multiple messages have this Message-ID (diff)

From: Sean Christopherson <seanjc@google.com>
To: James Houghton <jthoughton@google.com>
Cc: Yu Zhao <yuzhao@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	 Ankit Agrawal <ankita@nvidia.com>,
	Anup Patel <anup@brainfault.org>,
	 Atish Patra <atishp@atishpatra.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	 Bibo Mao <maobibo@loongson.cn>,
	Catalin Marinas <catalin.marinas@arm.com>,
	 David Matlack <dmatlack@google.com>,
	David Rientjes <rientjes@google.com>,
	 Huacai Chen <chenhuacai@kernel.org>,
	James Morse <james.morse@arm.com>,
	 Jonathan Corbet <corbet@lwn.net>, Marc Zyngier <maz@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	 Nicholas Piggin <npiggin@gmail.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	 Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	 Raghavendra Rao Ananta <rananta@google.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	 Shaoqin Huang <shahuang@redhat.com>,
	Shuah Khan <shuah@kernel.org>,
	 Suzuki K Poulose <suzuki.poulose@arm.com>,
	Tianrui Zhao <zhaotianrui@loongson.cn>,
	 Will Deacon <will@kernel.org>, Zenghui Yu <yuzenghui@huawei.com>,
	kvm-riscv@lists.infradead.org,  kvm@vger.kernel.org,
	kvmarm@lists.linux.dev,  linux-arm-kernel@lists.infradead.org,
	linux-doc@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org,  linux-mips@vger.kernel.org,
	linux-mm@kvack.org,  linux-riscv@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org,  loongarch@lists.linux.dev
Subject: Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging
Date: Mon, 3 Jun 2024 16:03:05 -0700	[thread overview]
Message-ID: <Zl5LqcusZ88QOGQY@google.com> (raw)
In-Reply-To: <CADrL8HW44Hx_Ejx_6+FVKt1V17PdgT6rw+sNtKzumqc9UCVDfA@mail.gmail.com>

On Mon, Jun 03, 2024, James Houghton wrote:
> On Thu, May 30, 2024 at 11:06 PM Yu Zhao <yuzhao@google.com> wrote:
> > What I don't think is acceptable is simplifying those optimizations
> > out without documenting your justifications (I would even call it a
> > design change, rather than simplification, from v3 to v4).
> 
> I'll put back something similar to what you had before (like a
> test_clear_young() with a "fast" parameter instead of "bitmap"). I
> like the idea of having a new mmu notifier, like
> fast_test_clear_young(), while leaving test_young() and clear_young()
> unchanged (where "fast" means "prioritize speed over accuracy").

Those two statements are contradicting each other, aren't they?  Anyways, I vote
for a "fast only" variant, e.g. test_clear_young_fast_only() or so.  gup() has
already established that terminology in mm/, so hopefully it would be familiar
to readers.  We could pass a param, but then the MGLRU code would likely end up
doing a bunch of useless indirect calls into secondary MMUs, whereas a dedicated
hook allows implementations to nullify the pointer if the API isn't supported
for whatever reason.

And pulling in Oliver's comments about locking, I think it's important that the
mmu_notifier API express it's requirement that the operation be "fast", not that
it be lockless.  E.g. if a secondary MMU can guarantee that a lock will be
contented only in rare, slow cases, then taking a lock is a-ok.  Or a secondary
MMU could do try-lock and bail if the lock is contended.

That way KVM can honor the intent of the API with an implementation that works
best for KVM _and_ for MGRLU.  I'm sure there will be future adjustments and fixes,
but that's just more motivation for using something like "fast only" instead of
"lockless".

> > > I made this logic change as part of removing batching.
> > >
> > > I'd really appreciate guidance on what the correct thing to do is.
> > >
> > > In my mind, what would work great is: by default, do aging exactly
> > > when KVM can do it locklessly, and then have a Kconfig to always have
> > > MGLRU to do aging with KVM if a user really cares about proactive
> > > reclaim (when the feature bit is set). The selftest can check the
> > > Kconfig + feature bit to know for sure if aging will be done.
> >
> > I still don't see how that Kconfig helps. Or why the new static branch
> > isn't enough?
> 
> Without a special Kconfig, the feature bit just tells us that aging
> with KVM is possible, not that it will necessarily be done. For the
> self-test, it'd be good to know exactly when aging is being done or
> not, so having a Kconfig like LRU_GEN_ALWAYS_WALK_SECONDARY_MMU would
> help make the self-test set the right expectations for aging.
> 
> The Kconfig would also allow a user to know that, no matter what,
> we're going to get correct age data for VMs, even if, say, we're using
> the shadow MMU.

Heh, unless KVM flushes, you won't get "correct" age data.

> This is somewhat important for me/Google Cloud. Is that reasonable? Maybe
> there's a better solution.

Hmm, no?  There's no reason to use a Kconfig, e.g. if we _really_ want to prioritize
accuracy over speed, then a KVM (x86?) module param to have KVM walk nested TDP
page tables would give us what we want.

But before we do that, I think we need to perform due dilegence (or provide data)
showing that having KVM take mmu_lock for write in the "fast only" API provides
better total behavior.  I.e. that the additional accuracy is indeed worth the cost.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)

From: Sean Christopherson <seanjc@google.com>
To: James Houghton <jthoughton@google.com>
Cc: kvm@vger.kernel.org, linux-doc@vger.kernel.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	Atish Patra <atishp@atishpatra.org>,
	linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev,
	linux-kselftest@vger.kernel.org,
	Raghavendra Rao Ananta <rananta@google.com>,
	linux-riscv@lists.infradead.org, Shuah Khan <shuah@kernel.org>,
	Yu Zhao <yuzhao@google.com>, Jonathan Corbet <corbet@lwn.net>,
	Anup Patel <anup@brainfault.org>,
	Huacai Chen <chenhuacai@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	linux-mips@vger.kernel.org, Albert Ou <aou@eecs.berkeley.edu>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Will Deacon <will@kernel.org>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Shaoqin Huang <shahuang@redhat.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Bibo Mao <maobibo@loongson.cn>,
	loongarch@lists.linux.dev,
	Paul Walmsley <paul.walmsley@sifive.com>,
	David Matlack <dmatlack@google.com>,
	Palmer Dabbelt <palmer@dabbelt.com >,
	linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org,
	Ankit Agrawal <ankita@nvidia.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	James Morse <james.morse@arm.com>,
	kvm-riscv@lists.infradead.org, Marc Zyngier <maz@kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tianrui Zhao <zhaotianrui@loongson.cn>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging
Date: Mon, 3 Jun 2024 16:03:05 -0700	[thread overview]
Message-ID: <Zl5LqcusZ88QOGQY@google.com> (raw)
In-Reply-To: <CADrL8HW44Hx_Ejx_6+FVKt1V17PdgT6rw+sNtKzumqc9UCVDfA@mail.gmail.com>

On Mon, Jun 03, 2024, James Houghton wrote:
> On Thu, May 30, 2024 at 11:06 PM Yu Zhao <yuzhao@google.com> wrote:
> > What I don't think is acceptable is simplifying those optimizations
> > out without documenting your justifications (I would even call it a
> > design change, rather than simplification, from v3 to v4).
> 
> I'll put back something similar to what you had before (like a
> test_clear_young() with a "fast" parameter instead of "bitmap"). I
> like the idea of having a new mmu notifier, like
> fast_test_clear_young(), while leaving test_young() and clear_young()
> unchanged (where "fast" means "prioritize speed over accuracy").

Those two statements are contradicting each other, aren't they?  Anyways, I vote
for a "fast only" variant, e.g. test_clear_young_fast_only() or so.  gup() has
already established that terminology in mm/, so hopefully it would be familiar
to readers.  We could pass a param, but then the MGLRU code would likely end up
doing a bunch of useless indirect calls into secondary MMUs, whereas a dedicated
hook allows implementations to nullify the pointer if the API isn't supported
for whatever reason.

And pulling in Oliver's comments about locking, I think it's important that the
mmu_notifier API express it's requirement that the operation be "fast", not that
it be lockless.  E.g. if a secondary MMU can guarantee that a lock will be
contented only in rare, slow cases, then taking a lock is a-ok.  Or a secondary
MMU could do try-lock and bail if the lock is contended.

That way KVM can honor the intent of the API with an implementation that works
best for KVM _and_ for MGRLU.  I'm sure there will be future adjustments and fixes,
but that's just more motivation for using something like "fast only" instead of
"lockless".

> > > I made this logic change as part of removing batching.
> > >
> > > I'd really appreciate guidance on what the correct thing to do is.
> > >
> > > In my mind, what would work great is: by default, do aging exactly
> > > when KVM can do it locklessly, and then have a Kconfig to always have
> > > MGLRU to do aging with KVM if a user really cares about proactive
> > > reclaim (when the feature bit is set). The selftest can check the
> > > Kconfig + feature bit to know for sure if aging will be done.
> >
> > I still don't see how that Kconfig helps. Or why the new static branch
> > isn't enough?
> 
> Without a special Kconfig, the feature bit just tells us that aging
> with KVM is possible, not that it will necessarily be done. For the
> self-test, it'd be good to know exactly when aging is being done or
> not, so having a Kconfig like LRU_GEN_ALWAYS_WALK_SECONDARY_MMU would
> help make the self-test set the right expectations for aging.
> 
> The Kconfig would also allow a user to know that, no matter what,
> we're going to get correct age data for VMs, even if, say, we're using
> the shadow MMU.

Heh, unless KVM flushes, you won't get "correct" age data.

> This is somewhat important for me/Google Cloud. Is that reasonable? Maybe
> there's a better solution.

Hmm, no?  There's no reason to use a Kconfig, e.g. if we _really_ want to prioritize
accuracy over speed, then a KVM (x86?) module param to have KVM walk nested TDP
page tables would give us what we want.

But before we do that, I think we need to perform due dilegence (or provide data)
showing that having KVM take mmu_lock for write in the "fast only" API provides
better total behavior.  I.e. that the additional accuracy is indeed worth the cost.

WARNING: multiple messages have this Message-ID (diff)

From: Sean Christopherson <seanjc@google.com>
To: James Houghton <jthoughton@google.com>
Cc: Yu Zhao <yuzhao@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Paolo Bonzini <pbonzini@redhat.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	 Ankit Agrawal <ankita@nvidia.com>,
	Anup Patel <anup@brainfault.org>,
	 Atish Patra <atishp@atishpatra.org>,
	Axel Rasmussen <axelrasmussen@google.com>,
	 Bibo Mao <maobibo@loongson.cn>,
	Catalin Marinas <catalin.marinas@arm.com>,
	 David Matlack <dmatlack@google.com>,
	David Rientjes <rientjes@google.com>,
	 Huacai Chen <chenhuacai@kernel.org>,
	James Morse <james.morse@arm.com>,
	 Jonathan Corbet <corbet@lwn.net>, Marc Zyngier <maz@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	 Nicholas Piggin <npiggin@gmail.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	 Palmer Dabbelt <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	 Raghavendra Rao Ananta <rananta@google.com>,
	Ryan Roberts <ryan.roberts@arm.com>,
	 Shaoqin Huang <shahuang@redhat.com>,
	Shuah Khan <shuah@kernel.org>,
	 Suzuki K Poulose <suzuki.poulose@arm.com>,
	Tianrui Zhao <zhaotianrui@loongson.cn>,
	 Will Deacon <will@kernel.org>, Zenghui Yu <yuzenghui@huawei.com>,
	kvm-riscv@lists.infradead.org,  kvm@vger.kernel.org,
	kvmarm@lists.linux.dev,  linux-arm-kernel@lists.infradead.org,
	linux-doc@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org,  linux-mips@vger.kernel.org,
	linux-mm@kvack.org,  linux-riscv@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org,  loongarch@lists.linux.dev
Subject: Re: [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging
Date: Mon, 3 Jun 2024 16:03:05 -0700	[thread overview]
Message-ID: <Zl5LqcusZ88QOGQY@google.com> (raw)
In-Reply-To: <CADrL8HW44Hx_Ejx_6+FVKt1V17PdgT6rw+sNtKzumqc9UCVDfA@mail.gmail.com>

On Mon, Jun 03, 2024, James Houghton wrote:
> On Thu, May 30, 2024 at 11:06 PM Yu Zhao <yuzhao@google.com> wrote:
> > What I don't think is acceptable is simplifying those optimizations
> > out without documenting your justifications (I would even call it a
> > design change, rather than simplification, from v3 to v4).
> 
> I'll put back something similar to what you had before (like a
> test_clear_young() with a "fast" parameter instead of "bitmap"). I
> like the idea of having a new mmu notifier, like
> fast_test_clear_young(), while leaving test_young() and clear_young()
> unchanged (where "fast" means "prioritize speed over accuracy").

Those two statements are contradicting each other, aren't they?  Anyways, I vote
for a "fast only" variant, e.g. test_clear_young_fast_only() or so.  gup() has
already established that terminology in mm/, so hopefully it would be familiar
to readers.  We could pass a param, but then the MGLRU code would likely end up
doing a bunch of useless indirect calls into secondary MMUs, whereas a dedicated
hook allows implementations to nullify the pointer if the API isn't supported
for whatever reason.

And pulling in Oliver's comments about locking, I think it's important that the
mmu_notifier API express it's requirement that the operation be "fast", not that
it be lockless.  E.g. if a secondary MMU can guarantee that a lock will be
contented only in rare, slow cases, then taking a lock is a-ok.  Or a secondary
MMU could do try-lock and bail if the lock is contended.

That way KVM can honor the intent of the API with an implementation that works
best for KVM _and_ for MGRLU.  I'm sure there will be future adjustments and fixes,
but that's just more motivation for using something like "fast only" instead of
"lockless".

> > > I made this logic change as part of removing batching.
> > >
> > > I'd really appreciate guidance on what the correct thing to do is.
> > >
> > > In my mind, what would work great is: by default, do aging exactly
> > > when KVM can do it locklessly, and then have a Kconfig to always have
> > > MGLRU to do aging with KVM if a user really cares about proactive
> > > reclaim (when the feature bit is set). The selftest can check the
> > > Kconfig + feature bit to know for sure if aging will be done.
> >
> > I still don't see how that Kconfig helps. Or why the new static branch
> > isn't enough?
> 
> Without a special Kconfig, the feature bit just tells us that aging
> with KVM is possible, not that it will necessarily be done. For the
> self-test, it'd be good to know exactly when aging is being done or
> not, so having a Kconfig like LRU_GEN_ALWAYS_WALK_SECONDARY_MMU would
> help make the self-test set the right expectations for aging.
> 
> The Kconfig would also allow a user to know that, no matter what,
> we're going to get correct age data for VMs, even if, say, we're using
> the shadow MMU.

Heh, unless KVM flushes, you won't get "correct" age data.

> This is somewhat important for me/Google Cloud. Is that reasonable? Maybe
> there's a better solution.

Hmm, no?  There's no reason to use a Kconfig, e.g. if we _really_ want to prioritize
accuracy over speed, then a KVM (x86?) module param to have KVM walk nested TDP
page tables would give us what we want.

But before we do that, I think we need to perform due dilegence (or provide data)
showing that having KVM take mmu_lock for write in the "fast only" API provides
better total behavior.  I.e. that the additional accuracy is indeed worth the cost.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

next prev parent reply	other threads:[~2024-06-03 23:03 UTC|newest]

Thread overview: 174+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-29 18:05 [PATCH v4 0/7] mm: multi-gen LRU: Walk secondary MMU page tables while aging James Houghton
2024-05-29 18:05 ` James Houghton
2024-05-29 18:05 ` James Houghton
2024-05-29 18:05 ` James Houghton
2024-05-29 18:05 ` James Houghton
2024-05-29 18:05 ` [PATCH v4 1/7] mm/Kconfig: Add LRU_GEN_WALKS_SECONDARY_MMU James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05 ` [PATCH v4 2/7] mm: multi-gen LRU: Have secondary MMUs participate in aging James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 21:03   ` Yu Zhao
2024-05-29 21:03     ` Yu Zhao
2024-05-29 21:03     ` Yu Zhao
2024-05-29 21:03     ` Yu Zhao
2024-05-29 21:03     ` Yu Zhao
2024-05-29 21:59     ` Sean Christopherson
2024-05-29 21:59       ` Sean Christopherson
2024-05-29 21:59       ` Sean Christopherson
2024-05-29 21:59       ` Sean Christopherson
2024-05-29 21:59       ` Sean Christopherson
2024-05-29 22:21       ` Yu Zhao
2024-05-29 22:21         ` Yu Zhao
2024-05-29 22:21         ` Yu Zhao
2024-05-29 22:21         ` Yu Zhao
2024-05-29 22:21         ` Yu Zhao
2024-05-29 22:58         ` Sean Christopherson
2024-05-29 22:58           ` Sean Christopherson
2024-05-29 22:58           ` Sean Christopherson
2024-05-29 22:58           ` Sean Christopherson
2024-05-29 22:58           ` Sean Christopherson
2024-05-30  1:08           ` James Houghton
2024-05-30  1:08             ` James Houghton
2024-05-30  1:08             ` James Houghton
2024-05-30  1:08             ` James Houghton
2024-05-30  1:08             ` James Houghton
2024-05-31  6:05             ` Yu Zhao
2024-05-31  6:06               ` Yu Zhao
2024-05-31  6:05               ` Yu Zhao
2024-05-31  6:05               ` Yu Zhao
2024-05-31  7:02               ` Oliver Upton
2024-05-31  7:02                 ` Oliver Upton
2024-05-31  7:02                 ` Oliver Upton
2024-05-31  7:02                 ` Oliver Upton
2024-05-31  7:02                 ` Oliver Upton
2024-05-31 16:45                 ` Yu Zhao
2024-05-31 16:45                   ` Yu Zhao
2024-05-31 16:45                   ` Yu Zhao
2024-05-31 16:45                   ` Yu Zhao
2024-05-31 16:45                   ` Yu Zhao
2024-05-31 18:41                   ` Oliver Upton
2024-05-31 18:41                     ` Oliver Upton
2024-05-31 18:41                     ` Oliver Upton
2024-05-31 18:41                     ` Oliver Upton
2024-05-31 18:41                     ` Oliver Upton
2024-06-03 22:45               ` James Houghton
2024-06-03 22:45                 ` James Houghton
2024-06-03 22:45                 ` James Houghton
2024-06-03 22:45                 ` James Houghton
2024-06-03 22:45                 ` James Houghton
2024-06-03 23:03                 ` Sean Christopherson [this message]
2024-06-03 23:03                   ` Sean Christopherson
2024-06-03 23:03                   ` Sean Christopherson
2024-06-03 23:03                   ` Sean Christopherson
2024-06-03 23:03                   ` Sean Christopherson
2024-06-03 23:16                   ` James Houghton
2024-06-03 23:16                     ` James Houghton
2024-06-03 23:16                     ` James Houghton
2024-06-03 23:16                     ` James Houghton
2024-06-03 23:16                     ` James Houghton
2024-06-04  0:23                     ` Sean Christopherson
2024-06-04  0:23                       ` Sean Christopherson
2024-06-04  0:23                       ` Sean Christopherson
2024-06-04  0:23                       ` Sean Christopherson
2024-06-04  0:23                       ` Sean Christopherson
2024-05-31  7:24     ` Oliver Upton
2024-05-31  7:24       ` Oliver Upton
2024-05-31  7:24       ` Oliver Upton
2024-05-31  7:24       ` Oliver Upton
2024-05-31  7:24       ` Oliver Upton
2024-05-31 20:31       ` Yu Zhao
2024-05-31 20:31         ` Yu Zhao
2024-05-31 20:31         ` Yu Zhao
2024-05-31 20:31         ` Yu Zhao
2024-05-31 20:31         ` Yu Zhao
2024-05-31 21:06         ` David Matlack
2024-05-31 21:06           ` David Matlack
2024-05-31 21:06           ` David Matlack
2024-05-31 21:06           ` David Matlack
2024-05-31 21:06           ` David Matlack
2024-05-31 21:09           ` David Matlack
2024-05-31 21:09             ` David Matlack
2024-05-31 21:09             ` David Matlack
2024-05-31 21:09             ` David Matlack
2024-05-31 21:09             ` David Matlack
2024-05-31 21:18         ` Oliver Upton
2024-05-31 21:18           ` Oliver Upton
2024-05-31 21:18           ` Oliver Upton
2024-05-31 21:18           ` Oliver Upton
2024-05-31 21:18           ` Oliver Upton
2024-05-29 18:05 ` [PATCH v4 3/7] KVM: Add lockless memslot walk to KVM James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 21:51   ` Sean Christopherson
2024-05-29 21:51     ` Sean Christopherson
2024-05-29 21:51     ` Sean Christopherson
2024-05-29 21:51     ` Sean Christopherson
2024-05-29 21:51     ` Sean Christopherson
2024-05-30  3:26     ` James Houghton
2024-05-30  3:26       ` James Houghton
2024-05-30  3:26       ` James Houghton
2024-05-30  3:26       ` James Houghton
2024-05-30  3:26       ` James Houghton
2024-05-29 18:05 ` [PATCH v4 4/7] KVM: Move MMU lock acquisition for test/clear_young to architecture James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 21:55   ` Sean Christopherson
2024-05-29 21:55     ` Sean Christopherson
2024-05-29 21:55     ` Sean Christopherson
2024-05-29 21:55     ` Sean Christopherson
2024-05-29 21:55     ` Sean Christopherson
2024-05-30  3:27     ` James Houghton
2024-05-30  3:27       ` James Houghton
2024-05-30  3:27       ` James Houghton
2024-05-30  3:27       ` James Houghton
2024-05-30  3:27       ` James Houghton
2024-05-29 18:05 ` [PATCH v4 5/7] KVM: x86: Relax locking for kvm_test_age_gfn and kvm_age_gfn James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05 ` [PATCH v4 6/7] KVM: arm64: " James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-31 19:11   ` Oliver Upton
2024-05-31 19:11     ` Oliver Upton
2024-05-31 19:11     ` Oliver Upton
2024-05-31 19:11     ` Oliver Upton
2024-05-31 19:11     ` Oliver Upton
2024-05-31 19:18     ` Oliver Upton
2024-05-31 19:18       ` Oliver Upton
2024-05-31 19:18       ` Oliver Upton
2024-05-31 19:18       ` Oliver Upton
2024-05-31 19:18       ` Oliver Upton
2024-06-04 22:20       ` James Houghton
2024-06-04 22:20         ` James Houghton
2024-06-04 22:20         ` James Houghton
2024-06-04 22:20         ` James Houghton
2024-06-04 22:20         ` James Houghton
2024-06-04 23:00         ` Oliver Upton
2024-06-04 23:00           ` Oliver Upton
2024-06-04 23:00           ` Oliver Upton
2024-06-04 23:00           ` Oliver Upton
2024-06-04 23:00           ` Oliver Upton
2024-06-04 23:36           ` Sean Christopherson
2024-06-04 23:36             ` Sean Christopherson
2024-06-04 23:36             ` Sean Christopherson
2024-06-04 23:36             ` Sean Christopherson
2024-06-04 23:36             ` Sean Christopherson
2024-05-29 18:05 ` [PATCH v4 7/7] KVM: selftests: Add multi-gen LRU aging to access_tracking_perf_test James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton
2024-05-29 18:05   ` James Houghton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zl5LqcusZ88QOGQY@google.com \
    --to=seanjc@google.com \
    --cc=kvm-riscv@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.