[RFC PATCH v1 0/7] Support arch-specific page aging mechanism

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH v1 0/7] Support arch-specific page aging mechanism
@ 2023-04-02 10:42 Aneesh Kumar K.V
  2023-04-02 10:42 ` [RFC PATCH v1 1/7] mm: Move some code around so that next patch is simpler Aneesh Kumar K.V
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Aneesh Kumar K.V @ 2023-04-02 10:42 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Dave Hansen, Johannes Weiner, Matthew Wilcox, Mel Gorman, Yu Zhao,
	Wei Xu, Guru Anbalagane, Aneesh Kumar K.V

Architectures like powerpc support page access count mechanism which can be used
for better identification of hot/cold pages in the system. POWER10 supports a
32-bit page access count which is incremented based on page access and
decremented based on time decay. The page access count is incremented based on
physical address filtering and hence should count access via page table(mmap)
and read/write syscall.

This patch series updates multi-gen LRU to use this page access count instead of
the page table reference bit to classify a page into a generation. Pages are
classified into generation during the sorting phase of reclaim. Currently
sorting phase use generation details stored in page flags and with this change,
we can avoid using page flags for storing generation. That will free the 3 bits
in page flag used to store generation. Since the page access counting mechanism can
also count access via read/write, we can look at avoiding using tier index
in page flags. That should free the 2 bits in page flag used for REFS (this is not
done in this patch).

I also added a patch that did the below
@@ -5243,7 +5243,8 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 	if (list_empty(&list))
 		return scanned;
 retry:
-	reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false);
+	reclaimed = shrink_folio_list(&list, pgdat, sc,
+				      &stat, arch_supports_page_access_count());
 	sc->nr_reclaimed += reclaimed;

The performance did improve, but that did result in a large increase in the
workingset_refault_anon. I think this is because it takes some minimal
access to classify the pages to the younger generation and we can have high page
refaults during that window.

PATCH 2 did result in some improvements on powerpc because it is removing all
additional code that is not used in page classification.

memcached:
patch details                 Total Ops/sec:
mglru                         160821
PATCH 2                       164572

mongodb:
Patch details                 Throughput(Ops/sec)
mglru                         92987
PATCH 2                       93740

Enabling the architecture-supported page access count does impact workload
performance since updating the access count involves some memory access
overhead. Another challenge with page access count is in determining relative
hotness between pages. I did try two methods density-based clustering and kmean
clustering to classify pages to LRU generation based on sampled hotness. Doing
more work during page classification is resulting in increased lock contention
on lru_lock and hence hurts performance.

memcached:
patch details                       Total Ops/sec:
arch page access count              161940
avoid folio_check_reference         171631 (but refault count increase from 2606765 -> 7793482)

mongodb:
Patch details                      Throughput(Ops/sec)
arch page access count             92533
avoid folio_check_reference        91105 ( refault: 828951 -> 4592539)

The patch series does show that using page access count is not resulting in any
regression and can keep the code simpler w.r.t different feedback loop used
during multi-gen LRU reclaim. This also saves some bits in page->flags . It was
also observed that overhead in counting page access is not that high and can be
mitigated by further tuning of the page generation classification logic. This
also enables us to start looking at using page access count in other parts of
the linux kernel like page promotion. I haven't been able to measure the impact
on page promotion yet due to hardware availability.

Aneesh Kumar K.V (7):
  mm: Move some code around so that next patch is simpler
  mm: Don't build multi-gen LRU page table walk code on architecture not
    supported
  mm: multi-gen LRU: avoid using generation stored in page flags for
    generation
  mm: multi-gen LRU: support different page aging mechanism
  powerpc/mm: Add page access count support
  powerpc/mm: Clear page access count on allocation
  mm: multi-gen LRU: Shrink folio list without checking for page table
    reference

 arch/Kconfig                          |   3 +
 arch/arm64/Kconfig                    |   1 +
 arch/powerpc/Kconfig                  |  10 +
 arch/powerpc/include/asm/hca.h        |  49 ++++
 arch/powerpc/include/asm/page.h       |   5 +
 arch/powerpc/include/asm/page_aging.h |  35 +++
 arch/powerpc/mm/Makefile              |   1 +
 arch/powerpc/mm/hca.c                 | 288 ++++++++++++++++++++
 arch/x86/Kconfig                      |   1 +
 include/linux/memcontrol.h            |   2 +-
 include/linux/mm_inline.h             |  47 +---
 include/linux/mm_types.h              |   8 +-
 include/linux/mmzone.h                |  15 +-
 include/linux/page_aging.h            |  43 +++
 include/linux/swap.h                  |   2 +-
 kernel/fork.c                         |   2 +-
 mm/Kconfig                            |   4 +
 mm/memcontrol.c                       |   2 +-
 mm/rmap.c                             |   4 +-
 mm/vmscan.c                           | 372 ++++++++++++++++++++++----
 20 files changed, 780 insertions(+), 114 deletions(-)
 create mode 100644 arch/powerpc/include/asm/hca.h
 create mode 100644 arch/powerpc/include/asm/page_aging.h
 create mode 100644 arch/powerpc/mm/hca.c
 create mode 100644 include/linux/page_aging.h

-- 
2.39.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH v1 1/7] mm: Move some code around so that next patch is simpler
  2023-04-02 10:42 [RFC PATCH v1 0/7] Support arch-specific page aging mechanism Aneesh Kumar K.V
@ 2023-04-02 10:42 ` Aneesh Kumar K.V
  2023-04-02 10:42 ` [RFC PATCH v1 2/7] mm: Don't build multi-gen LRU page table walk code on architecture not supported Aneesh Kumar K.V
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Aneesh Kumar K.V @ 2023-04-02 10:42 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Dave Hansen, Johannes Weiner, Matthew Wilcox, Mel Gorman, Yu Zhao,
	Wei Xu, Guru Anbalagane, Aneesh Kumar K.V

Move lrur_gen_add_folio to .c. We will support arch specific mapping
of page access count to generation in a later patch and will use
that when adding folio to lruvec. This move enables that.

No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 include/linux/mm_inline.h |  47 +-------------
 mm/vmscan.c               | 127 ++++++++++++++++++++++++++------------
 2 files changed, 88 insertions(+), 86 deletions(-)

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index ff3f3f23f649..4dc2ab95d612 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -217,52 +217,7 @@ static inline void lru_gen_update_size(struct lruvec *lruvec, struct folio *foli
 	VM_WARN_ON_ONCE(lru_gen_is_active(lruvec, old_gen) && !lru_gen_is_active(lruvec, new_gen));
 }
 
-static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
-{
-	unsigned long seq;
-	unsigned long flags;
-	int gen = folio_lru_gen(folio);
-	int type = folio_is_file_lru(folio);
-	int zone = folio_zonenum(folio);
-	struct lru_gen_struct *lrugen = &lruvec->lrugen;
-
-	VM_WARN_ON_ONCE_FOLIO(gen != -1, folio);
-
-	if (folio_test_unevictable(folio) || !lrugen->enabled)
-		return false;
-	/*
-	 * There are three common cases for this page:
-	 * 1. If it's hot, e.g., freshly faulted in or previously hot and
-	 *    migrated, add it to the youngest generation.
-	 * 2. If it's cold but can't be evicted immediately, i.e., an anon page
-	 *    not in swapcache or a dirty page pending writeback, add it to the
-	 *    second oldest generation.
-	 * 3. Everything else (clean, cold) is added to the oldest generation.
-	 */
-	if (folio_test_active(folio))
-		seq = lrugen->max_seq;
-	else if ((type == LRU_GEN_ANON && !folio_test_swapcache(folio)) ||
-		 (folio_test_reclaim(folio) &&
-		  (folio_test_dirty(folio) || folio_test_writeback(folio))))
-		seq = lrugen->min_seq[type] + 1;
-	else
-		seq = lrugen->min_seq[type];
-
-	gen = lru_gen_from_seq(seq);
-	flags = (gen + 1UL) << LRU_GEN_PGOFF;
-	/* see the comment on MIN_NR_GENS about PG_active */
-	set_mask_bits(&folio->flags, LRU_GEN_MASK | BIT(PG_active), flags);
-
-	lru_gen_update_size(lruvec, folio, -1, gen);
-	/* for folio_rotate_reclaimable() */
-	if (reclaiming)
-		list_add_tail(&folio->lru, &lrugen->lists[gen][type][zone]);
-	else
-		list_add(&folio->lru, &lrugen->lists[gen][type][zone]);
-
-	return true;
-}
-
+bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming);
 static inline bool lru_gen_del_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
 {
 	unsigned long flags;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5b7b8d4f5297..f47d80ae77ef 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3737,6 +3737,47 @@ static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclai
 	return new_gen;
 }
 
+static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr)
+{
+	unsigned long pfn = pte_pfn(pte);
+
+	VM_WARN_ON_ONCE(addr < vma->vm_start || addr >= vma->vm_end);
+
+	if (!pte_present(pte) || is_zero_pfn(pfn))
+		return -1;
+
+	if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte)))
+		return -1;
+
+	if (WARN_ON_ONCE(!pfn_valid(pfn)))
+		return -1;
+
+	return pfn;
+}
+
+static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg,
+				   struct pglist_data *pgdat, bool can_swap)
+{
+	struct folio *folio;
+
+	/* try to avoid unnecessary memory loads */
+	if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat))
+		return NULL;
+
+	folio = pfn_folio(pfn);
+	if (folio_nid(folio) != pgdat->node_id)
+		return NULL;
+
+	if (folio_memcg_rcu(folio) != memcg)
+		return NULL;
+
+	/* file VMAs can contain anon pages from COW */
+	if (!folio_is_file_lru(folio) && !can_swap)
+		return NULL;
+
+	return folio;
+}
+
 static void update_batch_size(struct lru_gen_mm_walk *walk, struct folio *folio,
 			      int old_gen, int new_gen)
 {
@@ -3843,23 +3884,6 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk
 	return false;
 }
 
-static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr)
-{
-	unsigned long pfn = pte_pfn(pte);
-
-	VM_WARN_ON_ONCE(addr < vma->vm_start || addr >= vma->vm_end);
-
-	if (!pte_present(pte) || is_zero_pfn(pfn))
-		return -1;
-
-	if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte)))
-		return -1;
-
-	if (WARN_ON_ONCE(!pfn_valid(pfn)))
-		return -1;
-
-	return pfn;
-}
 
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG)
 static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr)
@@ -3881,29 +3905,6 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned
 }
 #endif
 
-static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg,
-				   struct pglist_data *pgdat, bool can_swap)
-{
-	struct folio *folio;
-
-	/* try to avoid unnecessary memory loads */
-	if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat))
-		return NULL;
-
-	folio = pfn_folio(pfn);
-	if (folio_nid(folio) != pgdat->node_id)
-		return NULL;
-
-	if (folio_memcg_rcu(folio) != memcg)
-		return NULL;
-
-	/* file VMAs can contain anon pages from COW */
-	if (!folio_is_file_lru(folio) && !can_swap)
-		return NULL;
-
-	return folio;
-}
-
 static bool suitable_to_scan(int total, int young)
 {
 	int n = clamp_t(int, cache_line_size() / sizeof(pte_t), 2, 8);
@@ -5252,6 +5253,52 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
 	blk_finish_plug(&plug);
 }
 
+bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
+{
+	unsigned long seq;
+	unsigned long flags;
+	int gen = folio_lru_gen(folio);
+	int type = folio_is_file_lru(folio);
+	int zone = folio_zonenum(folio);
+	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+
+	VM_WARN_ON_ONCE_FOLIO(gen != -1, folio);
+
+	if (folio_test_unevictable(folio) || !lrugen->enabled)
+		return false;
+	/*
+	 * There are three common cases for this page:
+	 * 1. If it's hot, e.g., freshly faulted in or previously hot and
+	 *    migrated, add it to the youngest generation.
+	 * 2. If it's cold but can't be evicted immediately, i.e., an anon page
+	 *    not in swapcache or a dirty page pending writeback, add it to the
+	 *    second oldest generation.
+	 * 3. Everything else (clean, cold) is added to the oldest generation.
+	 */
+	if (folio_test_active(folio))
+		seq = lrugen->max_seq;
+	else if ((type == LRU_GEN_ANON && !folio_test_swapcache(folio)) ||
+		 (folio_test_reclaim(folio) &&
+		  (folio_test_dirty(folio) || folio_test_writeback(folio))))
+		seq = lrugen->min_seq[type] + 1;
+	else
+		seq = lrugen->min_seq[type];
+
+	gen = lru_gen_from_seq(seq);
+	flags = (gen + 1UL) << LRU_GEN_PGOFF;
+	/* see the comment on MIN_NR_GENS about PG_active */
+	set_mask_bits(&folio->flags, LRU_GEN_MASK | BIT(PG_active), flags);
+
+	lru_gen_update_size(lruvec, folio, -1, gen);
+	/* for folio_rotate_reclaimable() */
+	if (reclaiming)
+		list_add_tail(&folio->lru, &lrugen->lists[gen][type][zone]);
+	else
+		list_add(&folio->lru, &lrugen->lists[gen][type][zone]);
+
+	return true;
+}
+
 /******************************************************************************
  *                          state change
  ******************************************************************************/
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v1 2/7] mm: Don't build multi-gen LRU page table walk code on architecture not supported
  2023-04-02 10:42 [RFC PATCH v1 0/7] Support arch-specific page aging mechanism Aneesh Kumar K.V
  2023-04-02 10:42 ` [RFC PATCH v1 1/7] mm: Move some code around so that next patch is simpler Aneesh Kumar K.V
@ 2023-04-02 10:42 ` Aneesh Kumar K.V
  2023-04-02 10:42 ` [RFC PATCH v1 3/7] mm: multi-gen LRU: avoid using generation stored in page flags for generation Aneesh Kumar K.V
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Aneesh Kumar K.V @ 2023-04-02 10:42 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Dave Hansen, Johannes Weiner, Matthew Wilcox, Mel Gorman, Yu Zhao,
	Wei Xu, Guru Anbalagane, Aneesh Kumar K.V

Not all architecture supports hardware atomic updates of access bits.
On such an arch, we don't use page table walk to classify pages into
generations. Add a kernel config option and remove adding all the page
table walk code on such architecture.

lru_gen_look_around() code is duplicated because lru_gen_mm_walk
is not available always.

This patch did result in some improvements on powerpc because it is removing all
additional code that is not used in page classification.

memcached:
patch details                 Total Ops/sec:
mglru                         160821
PATCH 2                       164572

mongodb:
Patch details                 Throughput(Ops/sec)
mglru                         92987
PATCH 2                       93740

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/Kconfig               |   3 +
 arch/arm64/Kconfig         |   1 +
 arch/x86/Kconfig           |   1 +
 include/linux/memcontrol.h |   2 +-
 include/linux/mm_types.h   |   8 +-
 include/linux/mmzone.h     |  10 +-
 include/linux/swap.h       |   2 +-
 kernel/fork.c              |   2 +-
 mm/memcontrol.c            |   2 +-
 mm/vmscan.c                | 221 ++++++++++++++++++++++++++++++++++---
 10 files changed, 230 insertions(+), 22 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 12e3ddabac9d..61fc138bb91a 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1426,6 +1426,9 @@ config DYNAMIC_SIGFRAME
 config HAVE_ARCH_NODE_DEV_GROUP
 	bool
 
+config LRU_TASK_PAGE_AGING
+	bool
+
 config ARCH_HAS_NONLEAF_PMD_YOUNG
 	bool
 	help
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 27b2592698b0..b783b339ef59 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -219,6 +219,7 @@ config ARM64
 	select IRQ_DOMAIN
 	select IRQ_FORCED_THREADING
 	select KASAN_VMALLOC if KASAN
+	select LRU_TASK_PAGE_AGING
 	select MODULES_USE_ELF_RELA
 	select NEED_DMA_MAP_STATE
 	select NEED_SG_DMA_LENGTH
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a825bf031f49..805d3f6a1a58 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -274,6 +274,7 @@ config X86
 	select HAVE_GENERIC_VDSO
 	select HOTPLUG_SMT			if SMP
 	select IRQ_FORCED_THREADING
+	select LRU_TASK_PAGE_AGING
 	select NEED_PER_CPU_EMBED_FIRST_CHUNK
 	select NEED_PER_CPU_PAGE_FIRST_CHUNK
 	select NEED_SG_DMA_LENGTH
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a7b5925a033e..6b48a30a0dae 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -320,7 +320,7 @@ struct mem_cgroup {
 	struct deferred_split deferred_split_queue;
 #endif
 
-#ifdef CONFIG_LRU_GEN
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 	/* per-memcg mm_struct list */
 	struct lru_gen_mm_list mm_list;
 #endif
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index af8119776ab1..7bca8987a86b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -796,7 +796,7 @@ struct mm_struct {
 		 */
 		unsigned long ksm_rmap_items;
 #endif
-#ifdef CONFIG_LRU_GEN
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 		struct {
 			/* this mm_struct is on lru_gen_mm_list */
 			struct list_head list;
@@ -811,7 +811,7 @@ struct mm_struct {
 			struct mem_cgroup *memcg;
 #endif
 		} lru_gen;
-#endif /* CONFIG_LRU_GEN */
+#endif /* CONFIG_LRU_TASK_PAGE_AGING */
 	} __randomize_layout;
 
 	/*
@@ -839,7 +839,7 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
 	return (struct cpumask *)&mm->cpu_bitmap;
 }
 
-#ifdef CONFIG_LRU_GEN
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 
 struct lru_gen_mm_list {
 	/* mm_struct list for page table walkers */
@@ -873,7 +873,7 @@ static inline void lru_gen_use_mm(struct mm_struct *mm)
 	WRITE_ONCE(mm->lru_gen.bitmap, -1);
 }
 
-#else /* !CONFIG_LRU_GEN */
+#else /* !CONFIG_LRU_TASK_PAGE_AGING */
 
 static inline void lru_gen_add_mm(struct mm_struct *mm)
 {
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index cd28a100d9e4..0bcc5d88239a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -428,6 +428,7 @@ struct lru_gen_struct {
 	bool enabled;
 };
 
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 enum {
 	MM_LEAF_TOTAL,		/* total leaf entries */
 	MM_LEAF_OLD,		/* old leaf entries */
@@ -474,6 +475,7 @@ struct lru_gen_mm_walk {
 	bool can_swap;
 	bool force_scan;
 };
+#endif
 
 void lru_gen_init_lruvec(struct lruvec *lruvec);
 void lru_gen_look_around(struct page_vma_mapped_walk *pvmw);
@@ -525,8 +527,14 @@ struct lruvec {
 #ifdef CONFIG_LRU_GEN
 	/* evictable pages divided into generations */
 	struct lru_gen_struct		lrugen;
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 	/* to concurrently iterate lru_gen_mm_list */
 	struct lru_gen_mm_state		mm_state;
+#else
+	/* for concurrent update of max_seq without holding lru_lock */
+	struct wait_queue_head seq_update_wait;
+	bool seq_update_progress;
+#endif
 #endif
 #ifdef CONFIG_MEMCG
 	struct pglist_data *pgdat;
@@ -1240,7 +1248,7 @@ typedef struct pglist_data {
 
 	unsigned long		flags;
 
-#ifdef CONFIG_LRU_GEN
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 	/* kswap mm walk data */
 	struct lru_gen_mm_walk	mm_walk;
 #endif
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 0ceed49516ad..d79976635c42 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -154,7 +154,7 @@ union swap_header {
  */
 struct reclaim_state {
 	unsigned long reclaimed_slab;
-#ifdef CONFIG_LRU_GEN
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 	/* per-thread mm walk data */
 	struct lru_gen_mm_walk *mm_walk;
 #endif
diff --git a/kernel/fork.c b/kernel/fork.c
index 038b898dad52..804517394f55 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2708,7 +2708,7 @@ pid_t kernel_clone(struct kernel_clone_args *args)
 		get_task_struct(p);
 	}
 
-	if (IS_ENABLED(CONFIG_LRU_GEN) && !(clone_flags & CLONE_VM)) {
+	if (IS_ENABLED(CONFIG_LRU_TASK_PAGE_AGING) && !(clone_flags & CLONE_VM)) {
 		/* lock the task to synchronize with memcg migration */
 		task_lock(p);
 		lru_gen_add_mm(p->mm);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2ca843cc3aa6..1302f00bd5e7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6305,7 +6305,7 @@ static void mem_cgroup_move_task(void)
 }
 #endif
 
-#ifdef CONFIG_LRU_GEN
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 static void mem_cgroup_attach(struct cgroup_taskset *tset)
 {
 	struct task_struct *task;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f47d80ae77ef..f92b689af2a5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3225,6 +3225,7 @@ static bool __maybe_unused seq_is_valid(struct lruvec *lruvec)
 	       get_nr_gens(lruvec, LRU_GEN_ANON) <= MAX_NR_GENS;
 }
 
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 /******************************************************************************
  *                          mm_struct list
  ******************************************************************************/
@@ -3586,6 +3587,7 @@ static bool iterate_mm_list_nowalk(struct lruvec *lruvec, unsigned long max_seq)
 
 	return success;
 }
+#endif
 
 /******************************************************************************
  *                          refault feedback loop
@@ -3778,6 +3780,7 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg,
 	return folio;
 }
 
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 static void update_batch_size(struct lru_gen_mm_walk *walk, struct folio *folio,
 			      int old_gen, int new_gen)
 {
@@ -4235,7 +4238,7 @@ static void walk_mm(struct lruvec *lruvec, struct mm_struct *mm, struct lru_gen_
 	} while (err == -EAGAIN);
 }
 
-static struct lru_gen_mm_walk *set_mm_walk(struct pglist_data *pgdat)
+static void *set_mm_walk(struct pglist_data *pgdat)
 {
 	struct lru_gen_mm_walk *walk = current->reclaim_state->mm_walk;
 
@@ -4266,6 +4269,18 @@ static void clear_mm_walk(void)
 	if (!current_is_kswapd())
 		kfree(walk);
 }
+#else
+
+static inline void *set_mm_walk(struct pglist_data *pgdat)
+{
+	return NULL;
+}
+
+static inline void clear_mm_walk(void)
+{
+}
+
+#endif
 
 static bool inc_min_seq(struct lruvec *lruvec, int type, bool can_swap)
 {
@@ -4399,11 +4414,14 @@ static void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
 	/* make sure preceding modifications appear */
 	smp_store_release(&lrugen->max_seq, lrugen->max_seq + 1);
 
+#ifndef CONFIG_LRU_TASK_PAGE_AGING
+	lruvec->seq_update_progress = false;
+#endif
 	spin_unlock_irq(&lruvec->lru_lock);
 }
-
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
-			       struct scan_control *sc, bool can_swap, bool force_scan)
+			       int scan_priority, bool can_swap, bool force_scan)
 {
 	bool success;
 	struct lru_gen_mm_walk *walk;
@@ -4429,7 +4447,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
 		goto done;
 	}
 
-	walk = set_mm_walk(NULL);
+	walk = (struct lru_gen_mm_walk *)set_mm_walk(NULL);
 	if (!walk) {
 		success = iterate_mm_list_nowalk(lruvec, max_seq);
 		goto done;
@@ -4449,7 +4467,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
 	} while (mm);
 done:
 	if (!success) {
-		if (sc->priority <= DEF_PRIORITY - 2)
+		if (scan_priority <= DEF_PRIORITY - 2)
 			wait_event_killable(lruvec->mm_state.wait,
 					    max_seq < READ_ONCE(lrugen->max_seq));
 
@@ -4465,6 +4483,61 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
 
 	return true;
 }
+#else
+
+/*
+ * inc_max_seq can drop the lru_lock in between. So use a waitqueue seq_update_progress
+ * to allow concurrent access.
+ */
+bool __try_to_inc_max_seq(struct lruvec *lruvec,
+			  unsigned long max_seq, int scan_priority,
+			  bool can_swap, bool force_scan)
+{
+	bool success = false;
+	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+
+	VM_WARN_ON_ONCE(max_seq > READ_ONCE(lrugen->max_seq));
+
+	/* see the comment in iterate_mm_list() */
+	if (lruvec->seq_update_progress)
+		success = false;
+	else {
+		spin_lock_irq(&lruvec->lru_lock);
+
+		if (max_seq != lrugen->max_seq)
+			goto done;
+
+		if (lruvec->seq_update_progress)
+			goto done;
+
+		success = true;
+		lruvec->seq_update_progress = true;
+done:
+		spin_unlock_irq(&lruvec->lru_lock);
+	}
+	if (!success) {
+		if (scan_priority <= DEF_PRIORITY - 2)
+			wait_event_killable(lruvec->seq_update_wait,
+					    max_seq < READ_ONCE(lrugen->max_seq));
+
+		return max_seq < READ_ONCE(lrugen->max_seq);
+	}
+
+	VM_WARN_ON_ONCE(max_seq != READ_ONCE(lrugen->max_seq));
+	inc_max_seq(lruvec, can_swap, force_scan);
+	/* either this sees any waiters or they will see updated max_seq */
+	if (wq_has_sleeper(&lruvec->seq_update_wait))
+		wake_up_all(&lruvec->seq_update_wait);
+
+	return success;
+}
+
+static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
+			       int scan_priority, bool can_swap, bool force_scan)
+{
+	return __try_to_inc_max_seq(lruvec, max_seq, scan_priority, can_swap, force_scan);
+}
+#endif
 
 static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq, unsigned long *min_seq,
 			     struct scan_control *sc, bool can_swap, unsigned long *nr_to_scan)
@@ -4554,8 +4627,7 @@ static bool age_lruvec(struct lruvec *lruvec, struct scan_control *sc, unsigned
 	}
 
 	if (need_aging)
-		try_to_inc_max_seq(lruvec, max_seq, sc, swappiness, false);
-
+		try_to_inc_max_seq(lruvec, max_seq, sc->priority, swappiness, false);
 	return true;
 }
 
@@ -4617,6 +4689,7 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc)
 	}
 }
 
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 /*
  * This function exploits spatial locality when shrink_folio_list() walks the
  * rmap. It scans the adjacent PTEs of a young PTE and promotes hot pages. If
@@ -4744,6 +4817,115 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 
 	mem_cgroup_unlock_pages();
 }
+#else
+/*
+ * This function exploits spatial locality when shrink_page_list() walks the
+ * rmap. It scans the adjacent PTEs of a young PTE and promotes hot pages.
+ */
+void lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
+{
+	int i;
+	pte_t *pte;
+	unsigned long start;
+	unsigned long end;
+	unsigned long addr;
+	unsigned long bitmap[BITS_TO_LONGS(MIN_LRU_BATCH)] = {};
+	struct folio *folio = pfn_folio(pvmw->pfn);
+	struct mem_cgroup *memcg = folio_memcg(folio);
+	struct pglist_data *pgdat = folio_pgdat(folio);
+	struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
+	DEFINE_MAX_SEQ(lruvec);
+	int old_gen, new_gen = lru_gen_from_seq(max_seq);
+
+	lockdep_assert_held(pvmw->ptl);
+	VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio);
+
+	if (spin_is_contended(pvmw->ptl))
+		return;
+
+	start = max(pvmw->address & PMD_MASK, pvmw->vma->vm_start);
+	end = min(pvmw->address | ~PMD_MASK, pvmw->vma->vm_end - 1) + 1;
+
+	if (end - start > MIN_LRU_BATCH * PAGE_SIZE) {
+		if (pvmw->address - start < MIN_LRU_BATCH * PAGE_SIZE / 2)
+			end = start + MIN_LRU_BATCH * PAGE_SIZE;
+		else if (end - pvmw->address < MIN_LRU_BATCH * PAGE_SIZE / 2)
+			start = end - MIN_LRU_BATCH * PAGE_SIZE;
+		else {
+			start = pvmw->address - MIN_LRU_BATCH * PAGE_SIZE / 2;
+			end = pvmw->address + MIN_LRU_BATCH * PAGE_SIZE / 2;
+		}
+	}
+
+	pte = pvmw->pte - (pvmw->address - start) / PAGE_SIZE;
+
+	rcu_read_lock();
+	arch_enter_lazy_mmu_mode();
+
+	for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) {
+		unsigned long pfn;
+
+		pfn = get_pte_pfn(pte[i], pvmw->vma, addr);
+		if (pfn == -1)
+			continue;
+
+		if (!pte_young(pte[i]))
+			continue;
+
+		folio = get_pfn_folio(pfn, memcg, pgdat, true);
+		if (!folio)
+			continue;
+
+		if (!ptep_test_and_clear_young(pvmw->vma, addr, pte + i))
+			VM_WARN_ON_ONCE(true);
+
+		if (pte_dirty(pte[i]) && !folio_test_dirty(folio) &&
+		    !(folio_test_anon(folio) && folio_test_swapbacked(folio) &&
+		      !folio_test_swapcache(folio)))
+			folio_mark_dirty(folio);
+
+		old_gen = folio_lru_gen(folio);
+		if (old_gen < 0)
+			folio_set_referenced(folio);
+		else if (old_gen != new_gen)
+			__set_bit(i, bitmap);
+	}
+
+	arch_leave_lazy_mmu_mode();
+	rcu_read_unlock();
+
+	if (bitmap_weight(bitmap, MIN_LRU_BATCH) < PAGEVEC_SIZE) {
+		for_each_set_bit(i, bitmap, MIN_LRU_BATCH) {
+			folio = pfn_folio(pte_pfn(pte[i]));
+			folio_activate(folio);
+		}
+		return;
+	}
+
+	/* folio_update_gen() requires stable folio_memcg() */
+	if (!mem_cgroup_trylock_pages(memcg))
+		return;
+
+	spin_lock_irq(&lruvec->lru_lock);
+	new_gen = lru_gen_from_seq(lruvec->lrugen.max_seq);
+
+	for_each_set_bit(i, bitmap, MIN_LRU_BATCH) {
+		folio = pfn_folio(pte_pfn(pte[i]));
+		if (folio_memcg_rcu(folio) != memcg)
+			continue;
+
+		old_gen = folio_update_gen(folio, new_gen);
+		if (old_gen < 0 || old_gen == new_gen)
+			continue;
+
+		lru_gen_update_size(lruvec, folio, old_gen, new_gen);
+	}
+
+	spin_unlock_irq(&lruvec->lru_lock);
+
+	mem_cgroup_unlock_pages();
+}
+#endif
 
 /******************************************************************************
  *                          the eviction
@@ -5026,7 +5208,9 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 	struct folio *next;
 	enum vm_event_item item;
 	struct reclaim_stat stat;
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 	struct lru_gen_mm_walk *walk;
+#endif
 	bool skip_retry = false;
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
@@ -5081,9 +5265,11 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 
 	move_folios_to_lru(lruvec, &list);
 
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 	walk = current->reclaim_state->mm_walk;
 	if (walk && walk->batched)
 		reset_batch_size(lruvec, walk);
+#endif
 
 	item = PGSTEAL_KSWAPD + reclaimer_offset();
 	if (!cgroup_reclaim(sc))
@@ -5140,8 +5326,9 @@ static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *
 	if (current_is_kswapd())
 		return 0;
 
-	if (try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false))
+	if (try_to_inc_max_seq(lruvec, max_seq, sc->priority, can_swap, false))
 		return nr_to_scan;
+
 done:
 	return min_seq[!can_swap] + MIN_NR_GENS <= max_seq ? nr_to_scan : 0;
 }
@@ -5610,6 +5797,7 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
 		seq_putc(m, '\n');
 	}
 
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 	seq_puts(m, "                      ");
 	for (i = 0; i < NR_MM_STATS; i++) {
 		const char *s = "      ";
@@ -5626,6 +5814,7 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
 		seq_printf(m, " %10lu%c", n, s[i]);
 	}
 	seq_putc(m, '\n');
+#endif
 }
 
 /* see Documentation/admin-guide/mm/multigen_lru.rst for details */
@@ -5707,7 +5896,7 @@ static int run_aging(struct lruvec *lruvec, unsigned long seq, struct scan_contr
 	if (!force_scan && min_seq[!can_swap] + MAX_NR_GENS - 1 <= max_seq)
 		return -ERANGE;
 
-	try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, force_scan);
+	try_to_inc_max_seq(lruvec, max_seq, sc->priority, can_swap, force_scan);
 
 	return 0;
 }
@@ -5898,21 +6087,26 @@ void lru_gen_init_lruvec(struct lruvec *lruvec)
 
 	for_each_gen_type_zone(gen, type, zone)
 		INIT_LIST_HEAD(&lrugen->lists[gen][type][zone]);
-
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 	lruvec->mm_state.seq = MIN_NR_GENS;
 	init_waitqueue_head(&lruvec->mm_state.wait);
+#else
+	lruvec->seq_update_progress = false;
+	init_waitqueue_head(&lruvec->seq_update_wait);
+#endif
 }
 
 #ifdef CONFIG_MEMCG
 void lru_gen_init_memcg(struct mem_cgroup *memcg)
 {
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
 	INIT_LIST_HEAD(&memcg->mm_list.fifo);
 	spin_lock_init(&memcg->mm_list.lock);
+#endif
 }
 
 void lru_gen_exit_memcg(struct mem_cgroup *memcg)
 {
-	int i;
 	int nid;
 
 	for_each_node(nid) {
@@ -5920,11 +6114,12 @@ void lru_gen_exit_memcg(struct mem_cgroup *memcg)
 
 		VM_WARN_ON_ONCE(memchr_inv(lruvec->lrugen.nr_pages, 0,
 					   sizeof(lruvec->lrugen.nr_pages)));
-
-		for (i = 0; i < NR_BLOOM_FILTERS; i++) {
+#ifdef CONFIG_LRU_TASK_PAGE_AGING
+		for (int i = 0; i < NR_BLOOM_FILTERS; i++) {
 			bitmap_free(lruvec->mm_state.filters[i]);
 			lruvec->mm_state.filters[i] = NULL;
 		}
+#endif
 	}
 }
 #endif
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v1 3/7] mm: multi-gen LRU: avoid using generation stored in page flags for generation
  2023-04-02 10:42 [RFC PATCH v1 0/7] Support arch-specific page aging mechanism Aneesh Kumar K.V
  2023-04-02 10:42 ` [RFC PATCH v1 1/7] mm: Move some code around so that next patch is simpler Aneesh Kumar K.V
  2023-04-02 10:42 ` [RFC PATCH v1 2/7] mm: Don't build multi-gen LRU page table walk code on architecture not supported Aneesh Kumar K.V
@ 2023-04-02 10:42 ` Aneesh Kumar K.V
  2023-04-02 10:42 ` [RFC PATCH v1 4/7] mm: multi-gen LRU: support different page aging mechanism Aneesh Kumar K.V
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Aneesh Kumar K.V @ 2023-04-02 10:42 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Dave Hansen, Johannes Weiner, Matthew Wilcox, Mel Gorman, Yu Zhao,
	Wei Xu, Guru Anbalagane, Aneesh Kumar K.V

Some architectures can handle different methods for determining page
access count. In such case, we may not really need to use page flags
for tracking generation. We can possibly derive generation directly
from arch-supported access count values. Hence avoid using page flags
to store generation in that case.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 include/linux/page_aging.h | 26 ++++++++++++++++++++++++++
 mm/rmap.c                  |  4 +++-
 mm/vmscan.c                | 34 +++++++++++++++++++++++++---------
 3 files changed, 54 insertions(+), 10 deletions(-)
 create mode 100644 include/linux/page_aging.h

diff --git a/include/linux/page_aging.h b/include/linux/page_aging.h
new file mode 100644
index 000000000000..ab77f4578916
--- /dev/null
+++ b/include/linux/page_aging.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef _LINUX_PAGE_AGING_H
+#define _LINUX_PAGE_AGING_H
+
+#ifndef arch_supports_page_access_count
+static inline bool arch_supports_page_access_count(void)
+{
+	return false;
+}
+#endif
+
+#ifdef CONFIG_LRU_GEN
+#ifndef arch_get_lru_gen_seq
+static inline unsigned long arch_get_lru_gen_seq(struct lruvec *lruvec, struct folio *folio)
+{
+	int type = folio_is_file_lru(folio);
+
+	return lruvec->lrugen.min_seq[type];
+}
+#endif
+#endif /* CONFIG_LRU_GEN */
+
+#endif
+
+
diff --git a/mm/rmap.c b/mm/rmap.c
index b616870a09be..1ef3cb8119d5 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -74,6 +74,7 @@
 #include <linux/memremap.h>
 #include <linux/userfaultfd_k.h>
 #include <linux/mm_inline.h>
+#include <linux/page_aging.h>
 
 #include <asm/tlbflush.h>
 
@@ -825,7 +826,8 @@ static bool folio_referenced_one(struct folio *folio,
 		if (pvmw.pte) {
 			if (lru_gen_enabled() && pte_young(*pvmw.pte) &&
 			    !(vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ))) {
-				lru_gen_look_around(&pvmw);
+				if (!arch_supports_page_access_count())
+					lru_gen_look_around(&pvmw);
 				referenced++;
 			}
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f92b689af2a5..518d1482f6ab 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -62,6 +62,7 @@
 #include <linux/swapops.h>
 #include <linux/balloon_compaction.h>
 #include <linux/sched/sysctl.h>
+#include <linux/page_aging.h>
 
 #include "internal.h"
 #include "swap.h"
@@ -4934,7 +4935,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 static bool sort_folio(struct lruvec *lruvec, struct folio *folio, int tier_idx)
 {
 	bool success;
-	int gen = folio_lru_gen(folio);
+	int gen;
 	int type = folio_is_file_lru(folio);
 	int zone = folio_zonenum(folio);
 	int delta = folio_nr_pages(folio);
@@ -4942,7 +4943,6 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, int tier_idx)
 	int tier = lru_tier_from_refs(refs);
 	struct lru_gen_struct *lrugen = &lruvec->lrugen;
 
-	VM_WARN_ON_ONCE_FOLIO(gen >= MAX_NR_GENS, folio);
 
 	/* unevictable */
 	if (!folio_evictable(folio)) {
@@ -4963,8 +4963,14 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, int tier_idx)
 		return true;
 	}
 
-	/* promoted */
+	if (!arch_supports_page_access_count()) {
+		gen = folio_lru_gen(folio);
+		VM_WARN_ON_ONCE_FOLIO(gen >= MAX_NR_GENS, folio);
+	} else
+		gen = lru_gen_from_seq(arch_get_lru_gen_seq(lruvec, folio));
+
 	if (gen != lru_gen_from_seq(lrugen->min_seq[type])) {
+		/* promote the folio */
 		list_move(&folio->lru, &lrugen->lists[gen][type][zone]);
 		return true;
 	}
@@ -5464,12 +5470,22 @@ bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaimi
 	 */
 	if (folio_test_active(folio))
 		seq = lrugen->max_seq;
-	else if ((type == LRU_GEN_ANON && !folio_test_swapcache(folio)) ||
-		 (folio_test_reclaim(folio) &&
-		  (folio_test_dirty(folio) || folio_test_writeback(folio))))
-		seq = lrugen->min_seq[type] + 1;
-	else
-		seq = lrugen->min_seq[type];
+	else {
+		/*
+		 * For a non active folio use the arch based
+		 * aging details to derive the MGLRU generation.
+		 */
+		seq = arch_get_lru_gen_seq(lruvec, folio);
+
+		if (seq == lrugen->min_seq[type]) {
+			if ((type == LRU_GEN_ANON &&
+			     !folio_test_swapcache(folio)) ||
+			    (folio_test_reclaim(folio) &&
+			     (folio_test_dirty(folio) ||
+			      folio_test_writeback(folio))))
+				seq = lrugen->min_seq[type] + 1;
+		}
+	}
 
 	gen = lru_gen_from_seq(seq);
 	flags = (gen + 1UL) << LRU_GEN_PGOFF;
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v1 4/7] mm: multi-gen LRU: support different page aging mechanism
  2023-04-02 10:42 [RFC PATCH v1 0/7] Support arch-specific page aging mechanism Aneesh Kumar K.V
                   ` (2 preceding siblings ...)
  2023-04-02 10:42 ` [RFC PATCH v1 3/7] mm: multi-gen LRU: avoid using generation stored in page flags for generation Aneesh Kumar K.V
@ 2023-04-02 10:42 ` Aneesh Kumar K.V
  2023-04-02 10:42 ` [RFC PATCH v1 5/7] powerpc/mm: Add page access count support Aneesh Kumar K.V
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Aneesh Kumar K.V @ 2023-04-02 10:42 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Dave Hansen, Johannes Weiner, Matthew Wilcox, Mel Gorman, Yu Zhao,
	Wei Xu, Guru Anbalagane, Aneesh Kumar K.V

Some architectures can handle different methods for determining
page access count. We may want to do architecture-specific tasks
reclassifying generation temperature etc during aging for such
architecture. Add an arch hook to support that.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 include/linux/page_aging.h | 14 +++++++++++++-
 mm/vmscan.c                |  8 +++++++-
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/linux/page_aging.h b/include/linux/page_aging.h
index ab77f4578916..d7c63ce0d824 100644
--- a/include/linux/page_aging.h
+++ b/include/linux/page_aging.h
@@ -11,6 +11,10 @@ static inline bool arch_supports_page_access_count(void)
 #endif
 
 #ifdef CONFIG_LRU_GEN
+bool __try_to_inc_max_seq(struct lruvec *lruvec,
+			  unsigned long max_seq, int scan_priority,
+			  bool can_swap, bool force_scan);
+
 #ifndef arch_get_lru_gen_seq
 static inline unsigned long arch_get_lru_gen_seq(struct lruvec *lruvec, struct folio *folio)
 {
@@ -19,8 +23,16 @@ static inline unsigned long arch_get_lru_gen_seq(struct lruvec *lruvec, struct f
 	return lruvec->lrugen.min_seq[type];
 }
 #endif
-#endif /* CONFIG_LRU_GEN */
 
+#ifndef arch_try_to_inc_max_seq
+static inline bool arch_try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
+					   int scan_priority, bool can_swap, bool force_scan)
+{
+	return __try_to_inc_max_seq(lruvec, max_seq, scan_priority, can_swap, force_scan);
+}
+#endif
+
+#endif /* CONFIG_LRU_GEN */
 #endif
 
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 518d1482f6ab..c8b98201f0b0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4536,7 +4536,13 @@ bool __try_to_inc_max_seq(struct lruvec *lruvec,
 static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
 			       int scan_priority, bool can_swap, bool force_scan)
 {
-	return __try_to_inc_max_seq(lruvec, max_seq, scan_priority, can_swap, force_scan);
+	if (arch_supports_page_access_count())
+		return arch_try_to_inc_max_seq(lruvec, max_seq,
+					       scan_priority, can_swap,
+					       force_scan);
+	else
+		return __try_to_inc_max_seq(lruvec, max_seq,
+					    scan_priority, can_swap, force_scan);
 }
 #endif
 
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v1 5/7] powerpc/mm: Add page access count support
  2023-04-02 10:42 [RFC PATCH v1 0/7] Support arch-specific page aging mechanism Aneesh Kumar K.V
                   ` (3 preceding siblings ...)
  2023-04-02 10:42 ` [RFC PATCH v1 4/7] mm: multi-gen LRU: support different page aging mechanism Aneesh Kumar K.V
@ 2023-04-02 10:42 ` Aneesh Kumar K.V
  2023-04-02 10:42 ` [RFC PATCH v1 6/7] powerpc/mm: Clear page access count on allocation Aneesh Kumar K.V
  2023-04-02 10:42 ` [RFC PATCH v1 7/7] mm: multi-gen LRU: Shrink folio list without checking for page table reference Aneesh Kumar K.V
  6 siblings, 0 replies; 8+ messages in thread
From: Aneesh Kumar K.V @ 2023-04-02 10:42 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Dave Hansen, Johannes Weiner, Matthew Wilcox, Mel Gorman, Yu Zhao,
	Wei Xu, Guru Anbalagane, Aneesh Kumar K.V

Hot Cold Affinity engine is a facility provided by POWER10 where each access to
a page is counted and the access count value decreased/decayed if not accessed
in a time window. There is a 32-bit counter for each page.

This patch uses the HCA engine to provide page access count on POWER10 and uses
the same with multi-gen LRU to classify the page to correct LRU generation. This
uses a simple page classification mechanism where pages are sampled from the
youngest and oldest generation to find the max and min page hotness in the
lruvec. This value is later used to sort every page to the right generation.

The max and min hotness range is established during aging when new generations
are created.

Not-yet-Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/Kconfig                  |  10 +
 arch/powerpc/include/asm/hca.h        |  49 +++++
 arch/powerpc/include/asm/page_aging.h |  35 ++++
 arch/powerpc/mm/Makefile              |   1 +
 arch/powerpc/mm/hca.c                 | 275 ++++++++++++++++++++++++++
 include/linux/mmzone.h                |   5 +
 include/linux/page_aging.h            |   5 +
 mm/Kconfig                            |   4 +
 mm/vmscan.c                           |   5 +-
 9 files changed, 387 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/include/asm/hca.h
 create mode 100644 arch/powerpc/include/asm/page_aging.h
 create mode 100644 arch/powerpc/mm/hca.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 7a5f8dbfbdd0..71e8f23d9a96 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -1045,6 +1045,16 @@ config PPC_SECVAR_SYSFS
 	  read/write operations on these variables. Say Y if you have
 	  secure boot enabled and want to expose variables to userspace.
 
+config PPC_HCA_HOTNESS
+	prompt "PowerPC HCA engine based page hotness"
+	def_bool y
+	select ARCH_HAS_PAGE_AGING
+	depends on PPC_BOOK3S_64
+	help
+	  Use HCA engine to find page hotness
+
+	  If unsure, say N.
+
 endmenu
 
 config ISA_DMA_API
diff --git a/arch/powerpc/include/asm/hca.h b/arch/powerpc/include/asm/hca.h
new file mode 100644
index 000000000000..c0ed380594ca
--- /dev/null
+++ b/arch/powerpc/include/asm/hca.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Configuration helpers for the Hot-Cold Affinity helper
+ */
+
+#ifndef _ASM_POWERPC_HCA_H
+#define _ASM_POWERPC_HCA_H
+
+#include <linux/types.h>
+
+struct hca_entry {
+	unsigned long count;
+	unsigned long prev_count;
+	uint8_t age;
+};
+
+static inline unsigned long hotness_score(struct hca_entry *entry)
+{
+	unsigned long hotness;
+
+#if 0
+	/*
+	 * Give more weightage to the prev_count because it got
+	 * historical values. Take smaller part of count as we
+	 * age more because prev_count would be a better approximation.
+	 * We still need to consider count to accomidate spike in access.
+	 * + 1 with age to handle age == 0.
+	 */
+	hotness = entry->prev_count + (entry->count / (entry->age + 1));
+#else
+	/* Considering we are not finding in real workloads pages with
+	 * very high hotness a decay essentially move count value to prev count.
+	 * At that point we could look at decay as period zeroing of the counter.
+	 * I am finding better results with the below hotness score with real workloads.
+	 */
+	hotness = entry->prev_count + entry->count;
+#endif
+
+	return hotness;
+}
+
+extern void (*hca_backend_node_debugfs_init)(int numa_node, struct dentry *node_dentry);
+extern void (*hca_backend_debugfs_init)(struct dentry *root_dentry);
+extern int  (*hca_pfn_entry)(unsigned long pfn, struct hca_entry *entry);
+extern bool (*hca_node_enabled)(int numa_node);
+extern int  (*hca_clear_entry)(unsigned long pfn);
+
+#endif /* _ASM_POWERPC_HCA_H */
diff --git a/arch/powerpc/include/asm/page_aging.h b/arch/powerpc/include/asm/page_aging.h
new file mode 100644
index 000000000000..0d98cd877308
--- /dev/null
+++ b/arch/powerpc/include/asm/page_aging.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#ifndef _ASM_POWERPC_PAGE_AGING_H_
+#define _ASM_POWERPC_PAGE_AGING_H_
+
+#ifdef CONFIG_LRU_GEN
+extern bool hca_lru_age;
+unsigned long hca_map_lru_seq(struct lruvec *lruvec, struct folio *folio);
+bool hca_try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
+			    int scan_priority, bool can_swap, bool force_scan);
+
+#define arch_supports_page_access_count arch_supports_page_access_count
+static inline bool arch_supports_page_access_count(void)
+{
+	return hca_lru_age;
+}
+
+#define arch_try_to_inc_max_seq	arch_try_to_inc_max_seq
+static inline bool arch_try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
+					   int scan_priority, bool can_swap,
+					   bool force_scan)
+{
+	return hca_try_to_inc_max_seq(lruvec, max_seq, scan_priority,
+				      can_swap, force_scan);
+
+}
+
+#define arch_get_lru_gen_seq	arch_get_lru_gen_seq
+static inline unsigned long arch_get_lru_gen_seq(struct lruvec *lruvec, struct folio *folio)
+{
+	return hca_map_lru_seq(lruvec, folio);
+}
+
+#endif /* CONFIG_LRU_GEN */
+#endif
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 503a6e249940..30bd4ad4aff0 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -19,3 +19,4 @@ obj-$(CONFIG_NOT_COHERENT_CACHE) += dma-noncoherent.o
 obj-$(CONFIG_PPC_COPRO_BASE)	+= copro_fault.o
 obj-$(CONFIG_PTDUMP_CORE)	+= ptdump/
 obj-$(CONFIG_KASAN)		+= kasan/
+obj-$(CONFIG_PPC_HCA_HOTNESS)	+= hca.o
diff --git a/arch/powerpc/mm/hca.c b/arch/powerpc/mm/hca.c
new file mode 100644
index 000000000000..af6de4492ead
--- /dev/null
+++ b/arch/powerpc/mm/hca.c
@@ -0,0 +1,275 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/debugfs.h>
+#include <linux/jiffies.h>
+#include <linux/mm.h>
+#include <linux/mm_inline.h>
+#include <linux/page_aging.h>
+
+#include <asm/hca.h>
+
+bool hca_lru_age;
+static struct dentry *hca_debugfs_root;
+/*
+ * percentage of pfns to scan from each lurvec list to determine max/min hotness
+ */
+static ulong scan_pfn_ratio __read_mostly = 20;
+/*
+ * Millisec to wait/skip before starting another random scan
+ */
+static ulong scan_skip_msec __read_mostly = 60;
+
+/* backend callbacks  */
+void (*hca_backend_node_debugfs_init)(int numa_node, struct dentry *node_dentry);
+void (*hca_backend_debugfs_init)(struct dentry *root_dentry);
+int  (*hca_pfn_entry)(unsigned long pfn, struct hca_entry *entry);
+bool (*hca_node_enabled)(int numa_node);
+int  (*hca_clear_entry)(unsigned long pfn);
+
+static int parse_hca_age(char *arg)
+{
+	return strtobool(arg, &hca_lru_age);
+}
+early_param("hca_age", parse_hca_age);
+
+static inline int folio_hca_entry(struct folio *folio, struct hca_entry *entry)
+{
+	return hca_pfn_entry(folio_pfn(folio), entry);
+}
+
+#ifdef CONFIG_LRU_GEN
+static inline int get_nr_gens(struct lruvec *lruvec, int type)
+{
+	return lruvec->lrugen.max_seq - lruvec->lrugen.min_seq[type] + 1;
+}
+
+/* FIXME!! */
+static inline bool folio_evictable(struct folio *folio)
+{
+	bool ret;
+
+	/* Prevent address_space of inode and swap cache from being freed */
+	rcu_read_lock();
+	ret = !mapping_unevictable(folio_mapping(folio)) &&
+		!folio_test_mlocked(folio);
+	rcu_read_unlock();
+	return ret;
+}
+
+static void restablish_hotness_range(struct lruvec *lruvec)
+{
+	bool youngest = true;
+	int gen, nr_pages;
+	unsigned long seq;
+	int new_scan_pfn_count;
+	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	unsigned long current_hotness, max_hotness = 0, min_hotness = 0;
+
+	if (time_is_after_jiffies64(lrugen->next_span_scan))
+		return;
+
+	spin_lock_irq(&lruvec->lru_lock);
+
+retry:
+	for (int type = 0; type < ANON_AND_FILE; type++) {
+		for (int zone = 0; zone < MAX_NR_ZONES; zone++) {
+			int index = 0;
+			struct list_head *head;
+			struct folio *folio;
+			struct hca_entry entry;
+
+			if (youngest)
+				seq = lrugen->max_seq;
+			else
+				seq = lrugen->min_seq[type];
+			gen = lru_gen_from_seq(seq);
+			nr_pages = lrugen->nr_pages[gen][type][zone];
+
+			new_scan_pfn_count = nr_pages * scan_pfn_ratio/100;
+			if (!new_scan_pfn_count)
+				new_scan_pfn_count = nr_pages;
+
+			head = &lrugen->lists[gen][type][zone];
+			list_for_each_entry(folio, head, lru) {
+
+				if (unlikely(!folio_evictable(folio)))
+					continue;
+
+				if (folio_hca_entry(folio, &entry))
+					continue;
+
+				if (index++ > new_scan_pfn_count)
+					break;
+
+				current_hotness = hotness_score(&entry);
+				/* If the page didn't see any access, skip it */
+				if (!current_hotness)
+					continue;
+				/*
+				 * Let's make sure we at least wait 1 decay
+				 * updates before looking at this  pfn for
+				 * max/min computation.
+				 */
+				if (entry.age < 1)
+					continue;
+
+				if (current_hotness > max_hotness)
+					max_hotness = (current_hotness + max_hotness) / 2;
+				else if ((current_hotness < min_hotness) || !min_hotness)
+					min_hotness = (current_hotness + min_hotness) / 2;
+				else if ((current_hotness - min_hotness) < (max_hotness - min_hotness) / 2)
+					min_hotness = (current_hotness + min_hotness) / 2;
+				else
+					max_hotness = (current_hotness + max_hotness) / 2;
+
+			}
+
+		}
+	}
+	if (youngest) {
+		/* compute with oldest generation */
+		youngest = false;
+		goto retry;
+	}
+	lrugen->next_span_scan = get_jiffies_64() + msecs_to_jiffies(scan_skip_msec);
+	if (min_hotness) {
+		lrugen->max_hotness	=  max_hotness;
+		lrugen->min_hotness	=  min_hotness;
+	}
+
+	spin_unlock_irq(&lruvec->lru_lock);
+}
+
+/* Return Multigen LRU generation based on folio hotness */
+unsigned long hca_map_lru_seq(struct lruvec *lruvec, struct folio *folio)
+{
+	unsigned long seq;
+	int  type, nr_gens;
+	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+	struct hca_entry folio_entry;
+	unsigned long hotness, seq_range;
+
+	type = folio_is_file_lru(folio);
+	if (!hca_lru_age || folio_hca_entry(folio, &folio_entry))
+		/* return youngest generation ? */
+		return lrugen->min_seq[type];
+
+	hotness = hotness_score(&folio_entry);
+	/* The page didn't see any access, return oldest generation */
+	if (!hotness)
+		return lrugen->min_seq[type];
+
+	/* Also adjust based on current value. */
+	if (hotness > lrugen->max_hotness) {
+		lrugen->max_hotness =  (hotness + lrugen->max_hotness) / 2;
+		return lrugen->max_seq;
+	} else if (hotness < lrugen->min_hotness) {
+		lrugen->min_hotness =  (hotness + lrugen->min_hotness) / 2;
+		return lrugen->min_seq[type];
+	}
+
+	/*
+	 * Convert the max and min hotness into 4 ranges for sequence.
+	 * Then place our current hotness into one of these range.
+	 * We use the range number as an increment factor for generation.
+	 */
+	/* inclusive range min and max */
+	seq_range =  lrugen->max_hotness  - lrugen->min_hotness + 1;
+	nr_gens = get_nr_gens(lruvec, type);
+	seq_range =  (seq_range + nr_gens  - 1)/nr_gens;
+
+	/* higher the hotness younger the generation */
+	seq = lrugen->min_seq[type] + ((hotness - lrugen->min_hotness)/seq_range);
+
+	return seq;
+}
+
+bool hca_try_to_inc_max_seq(struct lruvec *lruvec,
+				   unsigned long max_seq, int scan_priority,
+				   bool can_swap, bool force_scan)
+
+{
+	bool success = false;
+	struct lru_gen_struct *lrugen = &lruvec->lrugen;
+
+	VM_WARN_ON_ONCE(max_seq > READ_ONCE(lrugen->max_seq));
+
+	/* see the comment in iterate_mm_list() */
+	if (lruvec->seq_update_progress)
+		success = false;
+	else {
+		spin_lock_irq(&lruvec->lru_lock);
+
+		if (max_seq != lrugen->max_seq)
+			goto done;
+
+		if (lruvec->seq_update_progress)
+			goto done;
+
+		success = true;
+		lruvec->seq_update_progress = true;
+done:
+		spin_unlock_irq(&lruvec->lru_lock);
+	}
+	if (!success) {
+		if (scan_priority <= DEF_PRIORITY - 2)
+			wait_event_killable(lruvec->seq_update_wait,
+					    max_seq < READ_ONCE(lrugen->max_seq));
+
+		return max_seq < READ_ONCE(lrugen->max_seq);
+	}
+
+	/*
+	 * With hardware aging use the counters to update
+	 * lruvec max and min hotness.
+	 */
+	restablish_hotness_range(lruvec);
+
+	VM_WARN_ON_ONCE(max_seq != READ_ONCE(lrugen->max_seq));
+	inc_max_seq(lruvec, can_swap, force_scan);
+	/* either this sees any waiters or they will see updated max_seq */
+	if (wq_has_sleeper(&lruvec->seq_update_wait))
+		wake_up_all(&lruvec->seq_update_wait);
+
+	return success;
+}
+#endif /* CONFIG_LRU_GEN */
+
+static void hca_debugfs_init(void)
+{
+	int node;
+	char name[32];
+	struct dentry *node_dentry;
+
+	hca_debugfs_root = debugfs_create_dir("hca", arch_debugfs_dir);
+
+	for_each_online_node(node) {
+		snprintf(name, sizeof(name), "node%u", node);
+		node_dentry = debugfs_create_dir(name, hca_debugfs_root);
+
+		hca_backend_node_debugfs_init(node, node_dentry);
+	}
+
+	debugfs_create_ulong("scan-pfn-ratio", 0600, hca_debugfs_root,
+			     &scan_pfn_ratio);
+	debugfs_create_ulong("scan-skip-msec", 0600, hca_debugfs_root,
+			     &scan_skip_msec);
+	debugfs_create_bool("hca_lru_age", 0600, hca_debugfs_root,
+			    &hca_lru_age);
+
+	/* Now create backend debugs */
+	hca_backend_debugfs_init(hca_debugfs_root);
+}
+
+static int __init hca_init(void)
+{
+	if (!hca_backend_debugfs_init) {
+		pr_info("No HCA device registered. Disabling hca lru gen\n");
+		hca_lru_age = false;
+	}
+
+	hca_debugfs_init();
+	return 0;
+}
+
+late_initcall(hca_init);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 0bcc5d88239a..934ad587a558 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -425,6 +425,11 @@ struct lru_gen_struct {
 	atomic_long_t evicted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS];
 	atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS];
 	/* whether the multi-gen LRU is enabled */
+#ifndef CONFIG_LRU_TASK_PAGE_AGING
+	unsigned long max_hotness;
+	unsigned long min_hotness;
+	u64 next_span_scan;
+#endif
 	bool enabled;
 };
 
diff --git a/include/linux/page_aging.h b/include/linux/page_aging.h
index d7c63ce0d824..074c876f17e1 100644
--- a/include/linux/page_aging.h
+++ b/include/linux/page_aging.h
@@ -3,6 +3,10 @@
 #ifndef _LINUX_PAGE_AGING_H
 #define _LINUX_PAGE_AGING_H
 
+#ifdef CONFIG_ARCH_HAS_PAGE_AGING
+#include <asm/page_aging.h>
+#endif
+
 #ifndef arch_supports_page_access_count
 static inline bool arch_supports_page_access_count(void)
 {
@@ -14,6 +18,7 @@ static inline bool arch_supports_page_access_count(void)
 bool __try_to_inc_max_seq(struct lruvec *lruvec,
 			  unsigned long max_seq, int scan_priority,
 			  bool can_swap, bool force_scan);
+void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan);
 
 #ifndef arch_get_lru_gen_seq
 static inline unsigned long arch_get_lru_gen_seq(struct lruvec *lruvec, struct folio *folio)
diff --git a/mm/Kconfig b/mm/Kconfig
index ff7b209dec05..493709ac758e 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1181,6 +1181,10 @@ config LRU_GEN_STATS
 	  from evicted generations for debugging purpose.
 
 	  This option has a per-memcg and per-node memory overhead.
+
+config ARCH_HAS_PAGE_AGING
+	bool
+
 # }
 
 source "mm/damon/Kconfig"
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c8b98201f0b0..a5f6238b3926 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4362,7 +4362,7 @@ static bool try_to_inc_min_seq(struct lruvec *lruvec, bool can_swap)
 	return success;
 }
 
-static void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
+void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
 {
 	int prev, next;
 	int type, zone;
@@ -4420,6 +4420,7 @@ static void inc_max_seq(struct lruvec *lruvec, bool can_swap, bool force_scan)
 #endif
 	spin_unlock_irq(&lruvec->lru_lock);
 }
+
 #ifdef CONFIG_LRU_TASK_PAGE_AGING
 static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long max_seq,
 			       int scan_priority, bool can_swap, bool force_scan)
@@ -5861,7 +5862,7 @@ static int lru_gen_seq_show(struct seq_file *m, void *v)
 		seq_printf(m, "memcg %5hu %s\n", mem_cgroup_id(memcg), path);
 	}
 
-	seq_printf(m, " node %5d\n", nid);
+	seq_printf(m, " node %5d max_hotness %ld min_hotness %ld\n", nid, lrugen->max_hotness, lrugen->min_hotness);
 
 	if (!full)
 		seq = min_seq[LRU_GEN_ANON];
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v1 6/7] powerpc/mm: Clear page access count on allocation
  2023-04-02 10:42 [RFC PATCH v1 0/7] Support arch-specific page aging mechanism Aneesh Kumar K.V
                   ` (4 preceding siblings ...)
  2023-04-02 10:42 ` [RFC PATCH v1 5/7] powerpc/mm: Add page access count support Aneesh Kumar K.V
@ 2023-04-02 10:42 ` Aneesh Kumar K.V
  2023-04-02 10:42 ` [RFC PATCH v1 7/7] mm: multi-gen LRU: Shrink folio list without checking for page table reference Aneesh Kumar K.V
  6 siblings, 0 replies; 8+ messages in thread
From: Aneesh Kumar K.V @ 2023-04-02 10:42 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Dave Hansen, Johannes Weiner, Matthew Wilcox, Mel Gorman, Yu Zhao,
	Wei Xu, Guru Anbalagane, Aneesh Kumar K.V

Clear the HCA access count value on allocation.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/page.h |  5 +++++
 arch/powerpc/mm/hca.c           | 13 +++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index edf1dd1b0ca9..515423744193 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -319,6 +319,11 @@ void arch_free_page(struct page *page, int order);
 #define HAVE_ARCH_FREE_PAGE
 #endif
 
+#ifdef CONFIG_PPC_HCA_HOTNESS
+void arch_alloc_page(struct page *page, int order);
+#define HAVE_ARCH_ALLOC_PAGE
+#endif
+
 struct vm_area_struct;
 
 extern unsigned long kernstart_virt_addr;
diff --git a/arch/powerpc/mm/hca.c b/arch/powerpc/mm/hca.c
index af6de4492ead..1e79ea89df1b 100644
--- a/arch/powerpc/mm/hca.c
+++ b/arch/powerpc/mm/hca.c
@@ -261,6 +261,19 @@ static void hca_debugfs_init(void)
 	hca_backend_debugfs_init(hca_debugfs_root);
 }
 
+void arch_alloc_page(struct page *page, int order)
+{
+	int i;
+
+	if (!hca_clear_entry)
+		return;
+
+	/* zero the counter value when we allocate the page */
+	for (i = 0; i < (1 << order); i++)
+		hca_clear_entry(page_to_pfn(page + i));
+}
+EXPORT_SYMBOL(arch_alloc_page);
+
 static int __init hca_init(void)
 {
 	if (!hca_backend_debugfs_init) {
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH v1 7/7] mm: multi-gen LRU: Shrink folio list without checking for page table reference
  2023-04-02 10:42 [RFC PATCH v1 0/7] Support arch-specific page aging mechanism Aneesh Kumar K.V
                   ` (5 preceding siblings ...)
  2023-04-02 10:42 ` [RFC PATCH v1 6/7] powerpc/mm: Clear page access count on allocation Aneesh Kumar K.V
@ 2023-04-02 10:42 ` Aneesh Kumar K.V
  6 siblings, 0 replies; 8+ messages in thread
From: Aneesh Kumar K.V @ 2023-04-02 10:42 UTC (permalink / raw)
  To: linux-mm, akpm
  Cc: Dave Hansen, Johannes Weiner, Matthew Wilcox, Mel Gorman, Yu Zhao,
	Wei Xu, Guru Anbalagane, Aneesh Kumar K.V

If arch supports page access count, we would have already sorted the lru
before collecting reclaimable pages. So unconditionally reclaim them in
shrink_folio_list

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/vmscan.c | 25 ++++++++-----------------
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a5f6238b3926..d9eb6a4d2975 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -5242,7 +5242,8 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 	if (list_empty(&list))
 		return scanned;
 retry:
-	reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false);
+	reclaimed = shrink_folio_list(&list, pgdat, sc,
+				      &stat, arch_supports_page_access_count());
 	sc->nr_reclaimed += reclaimed;
 
 	list_for_each_entry_safe_reverse(folio, next, &list, lru) {
@@ -5477,22 +5478,12 @@ bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaimi
 	 */
 	if (folio_test_active(folio))
 		seq = lrugen->max_seq;
-	else {
-		/*
-		 * For a non active folio use the arch based
-		 * aging details to derive the MGLRU generation.
-		 */
-		seq = arch_get_lru_gen_seq(lruvec, folio);
-
-		if (seq == lrugen->min_seq[type]) {
-			if ((type == LRU_GEN_ANON &&
-			     !folio_test_swapcache(folio)) ||
-			    (folio_test_reclaim(folio) &&
-			     (folio_test_dirty(folio) ||
-			      folio_test_writeback(folio))))
-				seq = lrugen->min_seq[type] + 1;
-		}
-	}
+	else if ((type == LRU_GEN_ANON && !folio_test_swapcache(folio)) ||
+		 (folio_test_reclaim(folio) &&
+		  (folio_test_dirty(folio) || folio_test_writeback(folio))))
+		seq = lrugen->min_seq[type] + 1;
+	else
+		seq = lrugen->min_seq[type];
 
 	gen = lru_gen_from_seq(seq);
 	flags = (gen + 1UL) << LRU_GEN_PGOFF;
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-04-02 10:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-02 10:42 [RFC PATCH v1 0/7] Support arch-specific page aging mechanism Aneesh Kumar K.V
2023-04-02 10:42 ` [RFC PATCH v1 1/7] mm: Move some code around so that next patch is simpler Aneesh Kumar K.V
2023-04-02 10:42 ` [RFC PATCH v1 2/7] mm: Don't build multi-gen LRU page table walk code on architecture not supported Aneesh Kumar K.V
2023-04-02 10:42 ` [RFC PATCH v1 3/7] mm: multi-gen LRU: avoid using generation stored in page flags for generation Aneesh Kumar K.V
2023-04-02 10:42 ` [RFC PATCH v1 4/7] mm: multi-gen LRU: support different page aging mechanism Aneesh Kumar K.V
2023-04-02 10:42 ` [RFC PATCH v1 5/7] powerpc/mm: Add page access count support Aneesh Kumar K.V
2023-04-02 10:42 ` [RFC PATCH v1 6/7] powerpc/mm: Clear page access count on allocation Aneesh Kumar K.V
2023-04-02 10:42 ` [RFC PATCH v1 7/7] mm: multi-gen LRU: Shrink folio list without checking for page table reference Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).