Re: [PATCH mm-unstable v1 5/5] mm: multi-gen LRU: use mmu_notifier_test_clear_young()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	 Jonathan Corbet <corbet@lwn.net>,
	Michael Larabel <michael@michaellarabel.com>,
	kvmarm@lists.linux.dev,  kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	 linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	 linuxppc-dev@lists.ozlabs.org, x86@kernel.org,
	linux-mm@google.com
Subject: Re: [PATCH mm-unstable v1 5/5] mm: multi-gen LRU: use mmu_notifier_test_clear_young()
Date: Thu, 23 Feb 2023 09:43:31 -0800	[thread overview]
Message-ID: <Y/elw7CTvVWt0Js6@google.com> (raw)
In-Reply-To: <20230217041230.2417228-6-yuzhao@google.com>

On Thu, Feb 16, 2023, Yu Zhao wrote:
> An existing selftest can quickly demonstrate the effectiveness of this
> patch. On a generic workstation equipped with 128 CPUs and 256GB DRAM:

Not my area of maintenance, but a non-existent changelog (for all intents and
purposes) for a change of this size and complexity is not acceptable.

>   $ sudo max_guest_memory_test -c 64 -m 250 -s 250
> 
>   MGLRU      run2
>   ---------------
>   Before    ~600s
>   After      ~50s
>   Off       ~250s
> 
>   kswapd (MGLRU before)
>     100.00%  balance_pgdat
>       100.00%  shrink_node
>         100.00%  shrink_one
>           99.97%  try_to_shrink_lruvec
>             99.06%  evict_folios
>               97.41%  shrink_folio_list
>                 31.33%  folio_referenced
>                   31.06%  rmap_walk_file
>                     30.89%  folio_referenced_one
>                       20.83%  __mmu_notifier_clear_flush_young
>                         20.54%  kvm_mmu_notifier_clear_flush_young
>   =>                      19.34%  _raw_write_lock
> 
>   kswapd (MGLRU after)
>     100.00%  balance_pgdat
>       100.00%  shrink_node
>         100.00%  shrink_one
>           99.97%  try_to_shrink_lruvec
>             99.51%  evict_folios
>               71.70%  shrink_folio_list
>                 7.08%  folio_referenced
>                   6.78%  rmap_walk_file
>                     6.72%  folio_referenced_one
>                       5.60%  lru_gen_look_around
>   =>                    1.53%  __mmu_notifier_test_clear_young

Do you happen to know how much of the improvement is due to batching, and how
much is due to using a walkless walk?

> @@ -5699,6 +5797,9 @@ static ssize_t show_enabled(struct kobject *kobj, struct kobj_attribute *attr, c
>  	if (arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG))
>  		caps |= BIT(LRU_GEN_NONLEAF_YOUNG);
>  
> +	if (kvm_arch_has_test_clear_young() && get_cap(LRU_GEN_SPTE_WALK))
> +		caps |= BIT(LRU_GEN_SPTE_WALK);

As alluded to in patch 1, unless batching the walks even if KVM does _not_ support
a lockless walk is somehow _worse_ than using the existing mmu_notifier_clear_flush_young(),
I think batching the calls should be conditional only on LRU_GEN_SPTE_WALK.  Or
if we want to avoid batching when there are no mmu_notifier listeners, probe
mmu_notifiers.  But don't call into KVM directly.

WARNING: multiple messages have this Message-ID (diff)

From: Sean Christopherson <seanjc@google.com>
To: Yu Zhao <yuzhao@google.com>
Cc: linux-mm@google.com, kvm@vger.kernel.org,
	Jonathan Corbet <corbet@lwn.net>,
	Michael Larabel <michael@michaellarabel.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kvmarm@lists.linux.dev, Paolo Bonzini <pbonzini@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH mm-unstable v1 5/5] mm: multi-gen LRU: use mmu_notifier_test_clear_young()
Date: Thu, 23 Feb 2023 09:43:31 -0800	[thread overview]
Message-ID: <Y/elw7CTvVWt0Js6@google.com> (raw)
In-Reply-To: <20230217041230.2417228-6-yuzhao@google.com>

On Thu, Feb 16, 2023, Yu Zhao wrote:
> An existing selftest can quickly demonstrate the effectiveness of this
> patch. On a generic workstation equipped with 128 CPUs and 256GB DRAM:

Not my area of maintenance, but a non-existent changelog (for all intents and
purposes) for a change of this size and complexity is not acceptable.

>   $ sudo max_guest_memory_test -c 64 -m 250 -s 250
> 
>   MGLRU      run2
>   ---------------
>   Before    ~600s
>   After      ~50s
>   Off       ~250s
> 
>   kswapd (MGLRU before)
>     100.00%  balance_pgdat
>       100.00%  shrink_node
>         100.00%  shrink_one
>           99.97%  try_to_shrink_lruvec
>             99.06%  evict_folios
>               97.41%  shrink_folio_list
>                 31.33%  folio_referenced
>                   31.06%  rmap_walk_file
>                     30.89%  folio_referenced_one
>                       20.83%  __mmu_notifier_clear_flush_young
>                         20.54%  kvm_mmu_notifier_clear_flush_young
>   =>                      19.34%  _raw_write_lock
> 
>   kswapd (MGLRU after)
>     100.00%  balance_pgdat
>       100.00%  shrink_node
>         100.00%  shrink_one
>           99.97%  try_to_shrink_lruvec
>             99.51%  evict_folios
>               71.70%  shrink_folio_list
>                 7.08%  folio_referenced
>                   6.78%  rmap_walk_file
>                     6.72%  folio_referenced_one
>                       5.60%  lru_gen_look_around
>   =>                    1.53%  __mmu_notifier_test_clear_young

Do you happen to know how much of the improvement is due to batching, and how
much is due to using a walkless walk?

> @@ -5699,6 +5797,9 @@ static ssize_t show_enabled(struct kobject *kobj, struct kobj_attribute *attr, c
>  	if (arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG))
>  		caps |= BIT(LRU_GEN_NONLEAF_YOUNG);
>  
> +	if (kvm_arch_has_test_clear_young() && get_cap(LRU_GEN_SPTE_WALK))
> +		caps |= BIT(LRU_GEN_SPTE_WALK);

As alluded to in patch 1, unless batching the walks even if KVM does _not_ support
a lockless walk is somehow _worse_ than using the existing mmu_notifier_clear_flush_young(),
I think batching the calls should be conditional only on LRU_GEN_SPTE_WALK.  Or
if we want to avoid batching when there are no mmu_notifier listeners, probe
mmu_notifiers.  But don't call into KVM directly.

WARNING: multiple messages have this Message-ID (diff)

From: Sean Christopherson <seanjc@google.com>
To: Yu Zhao <yuzhao@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	 Jonathan Corbet <corbet@lwn.net>,
	Michael Larabel <michael@michaellarabel.com>,
	kvmarm@lists.linux.dev,  kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	 linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	 linuxppc-dev@lists.ozlabs.org, x86@kernel.org,
	linux-mm@google.com
Subject: Re: [PATCH mm-unstable v1 5/5] mm: multi-gen LRU: use mmu_notifier_test_clear_young()
Date: Thu, 23 Feb 2023 09:43:31 -0800	[thread overview]
Message-ID: <Y/elw7CTvVWt0Js6@google.com> (raw)
In-Reply-To: <20230217041230.2417228-6-yuzhao@google.com>

On Thu, Feb 16, 2023, Yu Zhao wrote:
> An existing selftest can quickly demonstrate the effectiveness of this
> patch. On a generic workstation equipped with 128 CPUs and 256GB DRAM:

Not my area of maintenance, but a non-existent changelog (for all intents and
purposes) for a change of this size and complexity is not acceptable.

>   $ sudo max_guest_memory_test -c 64 -m 250 -s 250
> 
>   MGLRU      run2
>   ---------------
>   Before    ~600s
>   After      ~50s
>   Off       ~250s
> 
>   kswapd (MGLRU before)
>     100.00%  balance_pgdat
>       100.00%  shrink_node
>         100.00%  shrink_one
>           99.97%  try_to_shrink_lruvec
>             99.06%  evict_folios
>               97.41%  shrink_folio_list
>                 31.33%  folio_referenced
>                   31.06%  rmap_walk_file
>                     30.89%  folio_referenced_one
>                       20.83%  __mmu_notifier_clear_flush_young
>                         20.54%  kvm_mmu_notifier_clear_flush_young
>   =>                      19.34%  _raw_write_lock
> 
>   kswapd (MGLRU after)
>     100.00%  balance_pgdat
>       100.00%  shrink_node
>         100.00%  shrink_one
>           99.97%  try_to_shrink_lruvec
>             99.51%  evict_folios
>               71.70%  shrink_folio_list
>                 7.08%  folio_referenced
>                   6.78%  rmap_walk_file
>                     6.72%  folio_referenced_one
>                       5.60%  lru_gen_look_around
>   =>                    1.53%  __mmu_notifier_test_clear_young

Do you happen to know how much of the improvement is due to batching, and how
much is due to using a walkless walk?

> @@ -5699,6 +5797,9 @@ static ssize_t show_enabled(struct kobject *kobj, struct kobj_attribute *attr, c
>  	if (arch_has_hw_nonleaf_pmd_young() && get_cap(LRU_GEN_NONLEAF_YOUNG))
>  		caps |= BIT(LRU_GEN_NONLEAF_YOUNG);
>  
> +	if (kvm_arch_has_test_clear_young() && get_cap(LRU_GEN_SPTE_WALK))
> +		caps |= BIT(LRU_GEN_SPTE_WALK);

As alluded to in patch 1, unless batching the walks even if KVM does _not_ support
a lockless walk is somehow _worse_ than using the existing mmu_notifier_clear_flush_young(),
I think batching the calls should be conditional only on LRU_GEN_SPTE_WALK.  Or
if we want to avoid batching when there are no mmu_notifier listeners, probe
mmu_notifiers.  But don't call into KVM directly.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

next prev parent reply	other threads:[~2023-02-23 17:43 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-17  4:12 [PATCH mm-unstable v1 0/5] mm/kvm: lockless accessed bit harvest Yu Zhao
2023-02-17  4:12 ` Yu Zhao
2023-02-17  4:12 ` Yu Zhao
2023-02-17  4:12 ` [PATCH mm-unstable v1 1/5] mm/kvm: add mmu_notifier_test_clear_young() Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-23 17:13   ` Sean Christopherson
2023-02-23 17:13     ` Sean Christopherson
2023-02-23 17:13     ` Sean Christopherson
2023-02-23 17:40     ` Yu Zhao
2023-02-23 17:40       ` Yu Zhao
2023-02-23 17:40       ` Yu Zhao
2023-02-23 21:12       ` Sean Christopherson
2023-02-23 21:12         ` Sean Christopherson
2023-02-23 21:12         ` Sean Christopherson
2023-02-23 17:34   ` Sean Christopherson
2023-02-23 17:34     ` Sean Christopherson
2023-02-23 17:34     ` Sean Christopherson
2023-02-17  4:12 ` [PATCH mm-unstable v1 2/5] kvm/x86: add kvm_arch_test_clear_young() Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:19   ` Yu Zhao
2023-02-17  4:19     ` Yu Zhao
2023-02-17  4:19     ` Yu Zhao
2023-02-17 16:27   ` Sean Christopherson
2023-02-17 16:27     ` Sean Christopherson
2023-02-17 16:27     ` Sean Christopherson
2023-02-23  5:58     ` Yu Zhao
2023-02-23  5:58       ` Yu Zhao
2023-02-23  5:58       ` Yu Zhao
2023-02-23 17:09       ` Sean Christopherson
2023-02-23 17:09         ` Sean Christopherson
2023-02-23 17:09         ` Sean Christopherson
2023-02-23 17:27         ` Yu Zhao
2023-02-23 17:27           ` Yu Zhao
2023-02-23 17:27           ` Yu Zhao
2023-02-23 18:23           ` Sean Christopherson
2023-02-23 18:23             ` Sean Christopherson
2023-02-23 18:23             ` Sean Christopherson
2023-02-23 18:34             ` Yu Zhao
2023-02-23 18:34               ` Yu Zhao
2023-02-23 18:34               ` Yu Zhao
2023-02-23 18:47               ` Sean Christopherson
2023-02-23 18:47                 ` Sean Christopherson
2023-02-23 18:47                 ` Sean Christopherson
2023-02-23 19:02                 ` Yu Zhao
2023-02-23 19:02                   ` Yu Zhao
2023-02-23 19:02                   ` Yu Zhao
2023-02-23 19:21                   ` Sean Christopherson
2023-02-23 19:21                     ` Sean Christopherson
2023-02-23 19:21                     ` Sean Christopherson
2023-02-23 19:25                     ` Yu Zhao
2023-02-23 19:25                       ` Yu Zhao
2023-02-23 19:25                       ` Yu Zhao
2023-02-17  4:12 ` [PATCH mm-unstable v1 3/5] kvm/arm64: " Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:21   ` Yu Zhao
2023-02-17  4:21     ` Yu Zhao
2023-02-17  4:21     ` Yu Zhao
2023-02-17  9:00     ` Marc Zyngier
2023-02-17  9:00       ` Marc Zyngier
2023-02-17  9:00       ` Marc Zyngier
2023-02-23  3:58       ` Yu Zhao
2023-02-23  3:58         ` Yu Zhao
2023-02-23  3:58         ` Yu Zhao
2023-02-23  9:03         ` Marc Zyngier
2023-02-23  9:03           ` Marc Zyngier
2023-02-23  9:03           ` Marc Zyngier
2023-02-23  9:18           ` Yu Zhao
2023-02-23  9:18             ` Yu Zhao
2023-02-23  9:18             ` Yu Zhao
2023-02-17  9:09   ` Oliver Upton
2023-02-17  9:09     ` Oliver Upton
2023-02-17  9:09     ` Oliver Upton
2023-02-17 16:00     ` Sean Christopherson
2023-02-17 16:00       ` Sean Christopherson
2023-02-17 16:00       ` Sean Christopherson
2023-02-23  5:25       ` Yu Zhao
2023-02-23  5:25         ` Yu Zhao
2023-02-23  5:25         ` Yu Zhao
2023-02-23  4:43     ` Yu Zhao
2023-02-23  4:43       ` Yu Zhao
2023-02-23  4:43       ` Yu Zhao
2023-02-17  4:12 ` [PATCH mm-unstable v1 4/5] kvm/powerpc: " Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:24   ` Yu Zhao
2023-02-17  4:24     ` Yu Zhao
2023-02-17  4:24     ` Yu Zhao
2023-02-17  4:12 ` [PATCH mm-unstable v1 5/5] mm: multi-gen LRU: use mmu_notifier_test_clear_young() Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-17  4:12   ` Yu Zhao
2023-02-23 17:43   ` Sean Christopherson [this message]
2023-02-23 17:43     ` Sean Christopherson
2023-02-23 17:43     ` Sean Christopherson
2023-02-23 18:08     ` Yu Zhao
2023-02-23 18:08       ` Yu Zhao
2023-02-23 18:08       ` Yu Zhao
2023-02-23 19:11       ` Sean Christopherson
2023-02-23 19:11         ` Sean Christopherson
2023-02-23 19:11         ` Sean Christopherson
2023-02-23 19:36         ` Yu Zhao
2023-02-23 19:36           ` Yu Zhao
2023-02-23 19:36           ` Yu Zhao
2023-02-23 19:58           ` Sean Christopherson
2023-02-23 19:58             ` Sean Christopherson
2023-02-23 19:58             ` Sean Christopherson
2023-02-23 20:09             ` Yu Zhao
2023-02-23 20:09               ` Yu Zhao
2023-02-23 20:09               ` Yu Zhao
2023-02-23 20:28               ` Sean Christopherson
2023-02-23 20:28                 ` Sean Christopherson
2023-02-23 20:28                 ` Sean Christopherson
2023-02-23 20:48                 ` Yu Zhao
2023-02-23 20:48                   ` Yu Zhao
2023-02-23 20:48                   ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y/elw7CTvVWt0Js6@google.com \
    --to=seanjc@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@google.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=michael@michaellarabel.com \
    --cc=pbonzini@redhat.com \
    --cc=x86@kernel.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.