All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Kairui Song <kasong@tencent.com>,
	Kalesh Singh <kaleshsingh@google.com>
Subject: Re: [PATCH mm-unstable v2 6/6] mm/mglru: rework workingset protection
Date: Fri, 6 Dec 2024 21:44:15 -0700	[thread overview]
Message-ID: <Z1PSn79GPcCxeI_g@google.com> (raw)
In-Reply-To: <20241206003126.1338283-7-yuzhao@google.com>

On Thu, Dec 05, 2024 at 05:31:26PM -0700, Yu Zhao wrote:
> With the aging feedback no longer considering the distribution of
> folios in each generation, rework workingset protection to better
> distribute folios across MAX_NR_GENS. This is achieved by reusing
> PG_workingset and PG_referenced/LRU_REFS_FLAGS in a slightly different
> way.
> 
> For folios accessed multiple times through file descriptors, make
> lru_gen_inc_refs() set additional bits of LRU_REFS_WIDTH in
> folio->flags after PG_referenced, then PG_workingset after
> LRU_REFS_WIDTH. After all its bits are set, i.e.,
> LRU_REFS_FLAGS|BIT(PG_workingset), a folio is lazily promoted into the
> second oldest generation in the eviction path. And when
> folio_inc_gen() does that, it clears LRU_REFS_FLAGS so that
> lru_gen_inc_refs() can start over. For this case, LRU_REFS_MASK is
> only valid when PG_referenced is set.
> 
> For folios accessed multiple times through page tables,
> folio_update_gen() from a page table walk or lru_gen_set_refs() from a
> rmap walk sets PG_referenced after the accessed bit is cleared for the
> first time. Thereafter, those two paths set PG_workingset and promote
> folios to the youngest generation. Like folio_inc_gen(), when
> folio_update_gen() does that, it also clears PG_referenced. For this
> case, LRU_REFS_MASK is not used.
> 
> For both of the cases, after PG_workingset is set on a folio, it
> remains until this folio is either reclaimed, or "deactivated" by
> lru_gen_clear_refs(). It can be set again if lru_gen_test_recent()
> returns true upon a refault.
> 
> When adding folios to the LRU lists, lru_gen_distance() distributes
> them as follows:
> +---------------------------------+---------------------------------+
> |    Accessed thru page tables    | Accessed thru file descriptors  |
> +---------------------------------+---------------------------------+
> | PG_active (set while isolated)  |                                 |
> +----------------+----------------+----------------+----------------+
> | PG_workingset  | PG_referenced  | PG_workingset  | LRU_REFS_FLAGS |
> +---------------------------------+---------------------------------+
> |<--------- MIN_NR_GENS --------->|                                 |
> |<-------------------------- MAX_NR_GENS -------------------------->|
> 
> After this patch, some typical client and server workloads showed
> improvements under heavy memory pressure. For example, Python TPC-C,
> which was used to benchmark a different approach [1] to better detect
> refault distances, showed a significant decrease in total refaults:
>                             Before      After      Change
>   Time (seconds)            10801       10801      0%
>   Executed (transactions)   41472       43663      +5%
>   workingset_nodes          109070      120244     +10%
>   workingset_refault_anon   5019627     7281831    +45%
>   workingset_refault_file   1294678786  554855564  -57%
>   workingset_refault_total  1299698413  562137395  -57%
> 
> [1] https://lore.kernel.org/20230920190244.16839-1-ryncsn@gmail.com/
> 
> Reported-by: Kairui Song <kasong@tencent.com>
> Closes: https://lore.kernel.org/CAOUHufahuWcKf5f1Sg3emnqX+cODuR=2TQo7T4Gr-QYLujn4RA@mail.gmail.com/
> Signed-off-by: Yu Zhao <yuzhao@google.com>
> Tested-by: Kalesh Singh <kaleshsingh@google.com>
> ---
>  include/linux/mm_inline.h |  94 +++++++++++++------------
>  include/linux/mmzone.h    |  82 +++++++++++++---------
>  mm/swap.c                 |  23 +++---
>  mm/vmscan.c               | 142 +++++++++++++++++++++++---------------
>  mm/workingset.c           |  29 ++++----
>  5 files changed, 209 insertions(+), 161 deletions(-)

Some outlier results from LULESH (Livermore Unstructured Lagrangian
Explicit Shock Hydrodynamics) [1] caught my eye. The following fix
made the benchmark a lot happier (128GB DRAM + Optane swap):
                            Before    After    Change
  Average (z/s)             6894      7574     +10%
  Deviation (10 samples)    12.96%    1.76%    -86%

[1] https://asc.llnl.gov/codes/proxy-apps/lulesh

Andrew, can you please fold it in? Thanks!

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 90bbc2b3be8b..5e03a61c894f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -916,8 +916,7 @@ static enum folio_references folio_check_references(struct folio *folio,
 		if (!referenced_ptes)
 			return FOLIOREF_RECLAIM;
 
-		lru_gen_set_refs(folio);
-		return FOLIOREF_ACTIVATE;
+		return lru_gen_set_refs(folio) ? FOLIOREF_ACTIVATE : FOLIOREF_KEEP;
 	}
 
 	referenced_folio = folio_test_clear_referenced(folio);
@@ -4173,11 +4172,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 			old_gen = folio_update_gen(folio, new_gen);
 			if (old_gen >= 0 && old_gen != new_gen)
 				update_batch_size(walk, folio, old_gen, new_gen);
-
-			continue;
-		}
-
-		if (lru_gen_set_refs(folio)) {
+		} else if (lru_gen_set_refs(folio)) {
 			old_gen = folio_lru_gen(folio);
 			if (old_gen >= 0 && old_gen != new_gen)
 				folio_activate(folio);


  reply	other threads:[~2024-12-07  4:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-06  0:31 [PATCH mm-unstable v2 0/6] mm/mglru: performance optimizations Yu Zhao
2024-12-06  0:31 ` [PATCH mm-unstable v2 1/6] mm/mglru: clean up workingset Yu Zhao
2024-12-06  0:31 ` [PATCH mm-unstable v2 2/6] mm/mglru: optimize deactivation Yu Zhao
2024-12-06  0:31 ` [PATCH mm-unstable v2 3/6] mm/mglru: rework aging feedback Yu Zhao
2024-12-06  0:31 ` [PATCH mm-unstable v2 4/6] mm/mglru: rework type selection Yu Zhao
2024-12-06  0:31 ` [PATCH mm-unstable v2 5/6] mm/mglru: rework refault detection Yu Zhao
2024-12-06  0:31 ` [PATCH mm-unstable v2 6/6] mm/mglru: rework workingset protection Yu Zhao
2024-12-07  4:44   ` Yu Zhao [this message]
2024-12-07 19:09     ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z1PSn79GPcCxeI_g@google.com \
    --to=yuzhao@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=kaleshsingh@google.com \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.