[PATCH -next 0/5] mm/mglru: remove memcg lru

linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH -next 0/5] mm/mglru: remove memcg lru
@ 2025-12-09  1:25 Chen Ridong
  2025-12-09  1:25 ` [PATCH -next 1/5] mm/mglru: use mem_cgroup_iter for global reclaim Chen Ridong
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Chen Ridong @ 2025-12-09  1:25 UTC (permalink / raw)
  To: akpm, axelrasmussen, yuanchu, weixugc, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, hannes,
	roman.gushchin, shakeel.butt, muchun.song, zhengqi.arch
  Cc: linux-mm, linux-doc, linux-kernel, cgroups, lujialin4, chenridong,
	zhongjinji

From: Chen Ridong <chenridong@huawei.com>

The memcg LRU was introduced to improve scalability in global reclaim,
but its implementation has grown complex and can cause performance
regressions when creating many memory cgroups [1].

This series implements mem_cgroup_iter with a reclaim cookie in
shrink_many() for global reclaim, following the pattern already used in
shrink_node_memcgs(), an approach suggested by Johannes [1]. The new
design maintains good fairness across cgroups by preserving iteration
state between reclaim passes.

Testing was performed using the original stress test from Yu Zhao [2] on a
1 TB, 4-node NUMA system. The results show:

    pgsteal:
                                        memcg LRU    memcg iter
    stddev(pgsteal) / mean(pgsteal)     106.03%       93.20%
    sum(pgsteal) / sum(requested)        98.10%       99.28%

    workingset_refault_anon:
                                        memcg LRU    memcg iter
    stddev(refault) / mean(refault)     193.97%      134.67%
    sum(refault)                       1,963,229    2,027,567

The new implementation shows clear fairness improvements, reducing the
standard deviation relative to the mean by 12.8 percentage points for
pgsteal and bringing the pgsteal ratio closer to 100%. Refault counts
increased by 3.2% (from 1,963,229 to 2,027,567).

To simplify review:
1. Patch 1 uses mem_cgroup_iter with reclaim cookie in shrink_many()
2. Patch 2 removes the now-unused memcg LRU code
3. Patches 3–5 combine shrink_many and shrink_node_memcgs
   (This reorganization is clearer after switching to mem_cgroup_iter)

---

Changes from RFC series:
1. Updated the test result data.
2. Added patches 3–5 to combine shrink_many and shrink_node_memcgs.

RFC: https://lore.kernel.org/all/20251204123124.1822965-1-chenridong@huaweicloud.com/

Chen Ridong (5):
  mm/mglru: use mem_cgroup_iter for global reclaim
  mm/mglru: remove memcg lru
  mm/mglru: extend shrink_one for both lrugen and non-lrugen
  mm/mglru: combine shrink_many into shrink_node_memcgs
  mm/mglru: factor lrugen state out of shrink_lruvec

 Documentation/mm/multigen_lru.rst |  30 ---
 include/linux/mmzone.h            |  89 --------
 mm/memcontrol-v1.c                |   6 -
 mm/memcontrol.c                   |   4 -
 mm/mm_init.c                      |   1 -
 mm/vmscan.c                       | 332 ++++--------------------------
 6 files changed, 44 insertions(+), 418 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH -next 1/5] mm/mglru: use mem_cgroup_iter for global reclaim
  2025-12-09  1:25 [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
@ 2025-12-09  1:25 ` Chen Ridong
  2025-12-09  1:25 ` [PATCH -next 2/5] mm/mglru: remove memcg lru Chen Ridong
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Chen Ridong @ 2025-12-09  1:25 UTC (permalink / raw)
  To: akpm, axelrasmussen, yuanchu, weixugc, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, hannes,
	roman.gushchin, shakeel.butt, muchun.song, zhengqi.arch
  Cc: linux-mm, linux-doc, linux-kernel, cgroups, lujialin4, chenridong,
	zhongjinji

From: Chen Ridong <chenridong@huawei.com>

The memcg LRU was originally introduced for global reclaim to enhance
scalability. However, its implementation complexity has led to performance
regressions when dealing with a large number of memory cgroups [1].

As suggested by Johannes [1], this patch adopts mem_cgroup_iter with
cookie-based iteration for global reclaim, aligning with the approach
already used in shrink_node_memcgs. This simplification removes the
dedicated memcg LRU tracking while maintaining the core functionality.

It performed a stress test based on Yu Zhao's methodology [2] on a
1 TB, 4-node NUMA system. The results are summarized below:

	pgsteal:
						memcg LRU    memcg iter
	stddev(pgsteal) / mean(pgsteal)		106.03%		93.20%
	sum(pgsteal) / sum(requested)		98.10%		99.28%

	workingset_refault_anon:
						memcg LRU    memcg iter
	stddev(refault) / mean(refault)		193.97%		134.67%
	sum(refault)				1963229		2027567

The new implementation shows a clear fairness improvement, reducing the
standard deviation relative to the mean by 12.8 percentage points. The
pgsteal ratio is also closer to 100%. Refault counts increased by 3.2%
(from 1,963,229 to 2,027,567).

The primary benefits of this change are:
1. Simplified codebase by removing custom memcg LRU infrastructure
2. Improved fairness in memory reclaim across multiple cgroups
3. Better performance when creating many memory cgroups

[1] https://lore.kernel.org/r/20251126171513.GC135004@cmpxchg.org
[2] https://lore.kernel.org/r/20221222041905.2431096-7-yuzhao@google.com
Suggested-by: Johannes Weiner <hannes@cmxpchg.org>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Acked-by: Johannes Weiner <hannes@cmxpchg.org>
---
 mm/vmscan.c | 117 ++++++++++++++++------------------------------------
 1 file changed, 36 insertions(+), 81 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index fddd168a9737..70b0e7e5393c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4895,27 +4895,14 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
 	return nr_to_scan < 0;
 }
 
-static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
+static void shrink_one(struct lruvec *lruvec, struct scan_control *sc)
 {
-	bool success;
 	unsigned long scanned = sc->nr_scanned;
 	unsigned long reclaimed = sc->nr_reclaimed;
-	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 
-	/* lru_gen_age_node() called mem_cgroup_calculate_protection() */
-	if (mem_cgroup_below_min(NULL, memcg))
-		return MEMCG_LRU_YOUNG;
-
-	if (mem_cgroup_below_low(NULL, memcg)) {
-		/* see the comment on MEMCG_NR_GENS */
-		if (READ_ONCE(lruvec->lrugen.seg) != MEMCG_LRU_TAIL)
-			return MEMCG_LRU_TAIL;
-
-		memcg_memory_event(memcg, MEMCG_LOW);
-	}
-
-	success = try_to_shrink_lruvec(lruvec, sc);
+	try_to_shrink_lruvec(lruvec, sc);
 
 	shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority);
 
@@ -4924,86 +4911,55 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
 			   sc->nr_reclaimed - reclaimed);
 
 	flush_reclaim_state(sc);
-
-	if (success && mem_cgroup_online(memcg))
-		return MEMCG_LRU_YOUNG;
-
-	if (!success && lruvec_is_sizable(lruvec, sc))
-		return 0;
-
-	/* one retry if offlined or too small */
-	return READ_ONCE(lruvec->lrugen.seg) != MEMCG_LRU_TAIL ?
-	       MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG;
 }
 
 static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
 {
-	int op;
-	int gen;
-	int bin;
-	int first_bin;
-	struct lruvec *lruvec;
-	struct lru_gen_folio *lrugen;
+	struct mem_cgroup *target = sc->target_mem_cgroup;
+	struct mem_cgroup_reclaim_cookie reclaim = {
+		.pgdat = pgdat,
+	};
+	struct mem_cgroup_reclaim_cookie *cookie = &reclaim;
 	struct mem_cgroup *memcg;
-	struct hlist_nulls_node *pos;
 
-	gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq));
-	bin = first_bin = get_random_u32_below(MEMCG_NR_BINS);
-restart:
-	op = 0;
-	memcg = NULL;
-
-	rcu_read_lock();
+	if (current_is_kswapd() || sc->memcg_full_walk)
+		cookie = NULL;
 
-	hlist_nulls_for_each_entry_rcu(lrugen, pos, &pgdat->memcg_lru.fifo[gen][bin], list) {
-		if (op) {
-			lru_gen_rotate_memcg(lruvec, op);
-			op = 0;
-		}
+	memcg = mem_cgroup_iter(target, NULL, cookie);
+	while (memcg) {
+		struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
-		mem_cgroup_put(memcg);
-		memcg = NULL;
+		cond_resched();
 
-		if (gen != READ_ONCE(lrugen->gen))
-			continue;
+		mem_cgroup_calculate_protection(target, memcg);
 
-		lruvec = container_of(lrugen, struct lruvec, lrugen);
-		memcg = lruvec_memcg(lruvec);
+		if (mem_cgroup_below_min(target, memcg))
+			goto next;
 
-		if (!mem_cgroup_tryget(memcg)) {
-			lru_gen_release_memcg(memcg);
-			memcg = NULL;
-			continue;
+		if (mem_cgroup_below_low(target, memcg)) {
+			if (!sc->memcg_low_reclaim) {
+				sc->memcg_low_skipped = 1;
+				goto next;
+			}
+			memcg_memory_event(memcg, MEMCG_LOW);
 		}
 
-		rcu_read_unlock();
+		shrink_one(lruvec, sc);
 
-		op = shrink_one(lruvec, sc);
-
-		rcu_read_lock();
-
-		if (should_abort_scan(lruvec, sc))
+		if (should_abort_scan(lruvec, sc)) {
+			if (cookie)
+				mem_cgroup_iter_break(target, memcg);
 			break;
-	}
-
-	rcu_read_unlock();
-
-	if (op)
-		lru_gen_rotate_memcg(lruvec, op);
-
-	mem_cgroup_put(memcg);
-
-	if (!is_a_nulls(pos))
-		return;
+		}
 
-	/* restart if raced with lru_gen_rotate_memcg() */
-	if (gen != get_nulls_value(pos))
-		goto restart;
+next:
+		if (cookie && sc->nr_reclaimed >= sc->nr_to_reclaim) {
+			mem_cgroup_iter_break(target, memcg);
+			break;
+		}
 
-	/* try the rest of the bins of the current generation */
-	bin = get_memcg_bin(bin + 1);
-	if (bin != first_bin)
-		goto restart;
+		memcg = mem_cgroup_iter(target, memcg, cookie);
+	}
 }
 
 static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
@@ -5019,8 +4975,7 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
 
 	set_mm_walk(NULL, sc->proactive);
 
-	if (try_to_shrink_lruvec(lruvec, sc))
-		lru_gen_rotate_memcg(lruvec, MEMCG_LRU_YOUNG);
+	try_to_shrink_lruvec(lruvec, sc);
 
 	clear_mm_walk();
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH -next 2/5] mm/mglru: remove memcg lru
  2025-12-09  1:25 [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
  2025-12-09  1:25 ` [PATCH -next 1/5] mm/mglru: use mem_cgroup_iter for global reclaim Chen Ridong
@ 2025-12-09  1:25 ` Chen Ridong
  2025-12-09  1:25 ` [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen Chen Ridong
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Chen Ridong @ 2025-12-09  1:25 UTC (permalink / raw)
  To: akpm, axelrasmussen, yuanchu, weixugc, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, hannes,
	roman.gushchin, shakeel.butt, muchun.song, zhengqi.arch
  Cc: linux-mm, linux-doc, linux-kernel, cgroups, lujialin4, chenridong,
	zhongjinji

From: Chen Ridong <chenridong@huawei.com>

Now that the previous patch has switched global reclaim to use
mem_cgroup_iter, the specialized memcg LRU infrastructure is no longer
needed. This patch removes all related code:

Signed-off-by: Chen Ridong <chenridong@huawei.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 Documentation/mm/multigen_lru.rst |  30 ------
 include/linux/mmzone.h            |  89 -----------------
 mm/memcontrol-v1.c                |   6 --
 mm/memcontrol.c                   |   4 -
 mm/mm_init.c                      |   1 -
 mm/vmscan.c                       | 153 +-----------------------------
 6 files changed, 1 insertion(+), 282 deletions(-)

diff --git a/Documentation/mm/multigen_lru.rst b/Documentation/mm/multigen_lru.rst
index 52ed5092022f..bf8547e2f592 100644
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -220,36 +220,6 @@ time domain because a CPU can scan pages at different rates under
 varying memory pressure. It calculates a moving average for each new
 generation to avoid being permanently locked in a suboptimal state.
 
-Memcg LRU
----------
-An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs,
-since each node and memcg combination has an LRU of folios (see
-``mem_cgroup_lruvec()``). Its goal is to improve the scalability of
-global reclaim, which is critical to system-wide memory overcommit in
-data centers. Note that memcg LRU only applies to global reclaim.
-
-The basic structure of an memcg LRU can be understood by an analogy to
-the active/inactive LRU (of folios):
-
-1. It has the young and the old (generations), i.e., the counterparts
-   to the active and the inactive;
-2. The increment of ``max_seq`` triggers promotion, i.e., the
-   counterpart to activation;
-3. Other events trigger similar operations, e.g., offlining an memcg
-   triggers demotion, i.e., the counterpart to deactivation.
-
-In terms of global reclaim, it has two distinct features:
-
-1. Sharding, which allows each thread to start at a random memcg (in
-   the old generation) and improves parallelism;
-2. Eventual fairness, which allows direct reclaim to bail out at will
-   and reduces latency without affecting fairness over some time.
-
-In terms of traversing memcgs during global reclaim, it improves the
-best-case complexity from O(n) to O(1) and does not affect the
-worst-case complexity O(n). Therefore, on average, it has a sublinear
-complexity.
-
 Summary
 -------
 The multi-gen LRU (of folios) can be disassembled into the following
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 75ef7c9f9307..49952301ff3b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -509,12 +509,6 @@ struct lru_gen_folio {
 	atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS];
 	/* whether the multi-gen LRU is enabled */
 	bool enabled;
-	/* the memcg generation this lru_gen_folio belongs to */
-	u8 gen;
-	/* the list segment this lru_gen_folio belongs to */
-	u8 seg;
-	/* per-node lru_gen_folio list for global reclaim */
-	struct hlist_nulls_node list;
 };
 
 enum {
@@ -558,79 +552,14 @@ struct lru_gen_mm_walk {
 	bool force_scan;
 };
 
-/*
- * For each node, memcgs are divided into two generations: the old and the
- * young. For each generation, memcgs are randomly sharded into multiple bins
- * to improve scalability. For each bin, the hlist_nulls is virtually divided
- * into three segments: the head, the tail and the default.
- *
- * An onlining memcg is added to the tail of a random bin in the old generation.
- * The eviction starts at the head of a random bin in the old generation. The
- * per-node memcg generation counter, whose reminder (mod MEMCG_NR_GENS) indexes
- * the old generation, is incremented when all its bins become empty.
- *
- * There are four operations:
- * 1. MEMCG_LRU_HEAD, which moves a memcg to the head of a random bin in its
- *    current generation (old or young) and updates its "seg" to "head";
- * 2. MEMCG_LRU_TAIL, which moves a memcg to the tail of a random bin in its
- *    current generation (old or young) and updates its "seg" to "tail";
- * 3. MEMCG_LRU_OLD, which moves a memcg to the head of a random bin in the old
- *    generation, updates its "gen" to "old" and resets its "seg" to "default";
- * 4. MEMCG_LRU_YOUNG, which moves a memcg to the tail of a random bin in the
- *    young generation, updates its "gen" to "young" and resets its "seg" to
- *    "default".
- *
- * The events that trigger the above operations are:
- * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD;
- * 2. The first attempt to reclaim a memcg below low, which triggers
- *    MEMCG_LRU_TAIL;
- * 3. The first attempt to reclaim a memcg offlined or below reclaimable size
- *    threshold, which triggers MEMCG_LRU_TAIL;
- * 4. The second attempt to reclaim a memcg offlined or below reclaimable size
- *    threshold, which triggers MEMCG_LRU_YOUNG;
- * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YOUNG;
- * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG;
- * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD.
- *
- * Notes:
- * 1. Memcg LRU only applies to global reclaim, and the round-robin incrementing
- *    of their max_seq counters ensures the eventual fairness to all eligible
- *    memcgs. For memcg reclaim, it still relies on mem_cgroup_iter().
- * 2. There are only two valid generations: old (seq) and young (seq+1).
- *    MEMCG_NR_GENS is set to three so that when reading the generation counter
- *    locklessly, a stale value (seq-1) does not wraparound to young.
- */
-#define MEMCG_NR_GENS	3
-#define MEMCG_NR_BINS	8
-
-struct lru_gen_memcg {
-	/* the per-node memcg generation counter */
-	unsigned long seq;
-	/* each memcg has one lru_gen_folio per node */
-	unsigned long nr_memcgs[MEMCG_NR_GENS];
-	/* per-node lru_gen_folio list for global reclaim */
-	struct hlist_nulls_head	fifo[MEMCG_NR_GENS][MEMCG_NR_BINS];
-	/* protects the above */
-	spinlock_t lock;
-};
-
-void lru_gen_init_pgdat(struct pglist_data *pgdat);
 void lru_gen_init_lruvec(struct lruvec *lruvec);
 bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw);
 
 void lru_gen_init_memcg(struct mem_cgroup *memcg);
 void lru_gen_exit_memcg(struct mem_cgroup *memcg);
-void lru_gen_online_memcg(struct mem_cgroup *memcg);
-void lru_gen_offline_memcg(struct mem_cgroup *memcg);
-void lru_gen_release_memcg(struct mem_cgroup *memcg);
-void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid);
 
 #else /* !CONFIG_LRU_GEN */
 
-static inline void lru_gen_init_pgdat(struct pglist_data *pgdat)
-{
-}
-
 static inline void lru_gen_init_lruvec(struct lruvec *lruvec)
 {
 }
@@ -648,22 +577,6 @@ static inline void lru_gen_exit_memcg(struct mem_cgroup *memcg)
 {
 }
 
-static inline void lru_gen_online_memcg(struct mem_cgroup *memcg)
-{
-}
-
-static inline void lru_gen_offline_memcg(struct mem_cgroup *memcg)
-{
-}
-
-static inline void lru_gen_release_memcg(struct mem_cgroup *memcg)
-{
-}
-
-static inline void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid)
-{
-}
-
 #endif /* CONFIG_LRU_GEN */
 
 struct lruvec {
@@ -1503,8 +1416,6 @@ typedef struct pglist_data {
 #ifdef CONFIG_LRU_GEN
 	/* kswap mm walk data */
 	struct lru_gen_mm_walk mm_walk;
-	/* lru_gen_folio list */
-	struct lru_gen_memcg memcg_lru;
 #endif
 
 	CACHELINE_PADDING(_pad2_);
diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 6eed14bff742..8f41e72ae7f0 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -182,12 +182,6 @@ static void memcg1_update_tree(struct mem_cgroup *memcg, int nid)
 	struct mem_cgroup_per_node *mz;
 	struct mem_cgroup_tree_per_node *mctz;
 
-	if (lru_gen_enabled()) {
-		if (soft_limit_excess(memcg))
-			lru_gen_soft_reclaim(memcg, nid);
-		return;
-	}
-
 	mctz = soft_limit_tree.rb_tree_per_node[nid];
 	if (!mctz)
 		return;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index be810c1fbfc3..ab3ebecb5ec7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3874,8 +3874,6 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
 	if (unlikely(mem_cgroup_is_root(memcg)) && !mem_cgroup_disabled())
 		queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
 				   FLUSH_TIME);
-	lru_gen_online_memcg(memcg);
-
 	/* Online state pins memcg ID, memcg ID pins CSS */
 	refcount_set(&memcg->id.ref, 1);
 	css_get(css);
@@ -3915,7 +3913,6 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
 	reparent_deferred_split_queue(memcg);
 	reparent_shrinker_deferred(memcg);
 	wb_memcg_offline(memcg);
-	lru_gen_offline_memcg(memcg);
 
 	drain_all_stock(memcg);
 
@@ -3927,7 +3924,6 @@ static void mem_cgroup_css_released(struct cgroup_subsys_state *css)
 	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 
 	invalidate_reclaim_iterators(memcg);
-	lru_gen_release_memcg(memcg);
 }
 
 static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index fc2a6f1e518f..6e5e1fe6ff31 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1745,7 +1745,6 @@ static void __init free_area_init_node(int nid)
 	pgdat_set_deferred_range(pgdat);
 
 	free_area_init_core(pgdat);
-	lru_gen_init_pgdat(pgdat);
 }
 
 /* Any regular or high memory on that node ? */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 70b0e7e5393c..584f41eb4c14 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2698,9 +2698,6 @@ static bool should_clear_pmd_young(void)
 #define for_each_evictable_type(type, swappiness)			\
 	for ((type) = min_type(swappiness); (type) <= max_type(swappiness); (type)++)
 
-#define get_memcg_gen(seq)	((seq) % MEMCG_NR_GENS)
-#define get_memcg_bin(bin)	((bin) % MEMCG_NR_BINS)
-
 static struct lruvec *get_lruvec(struct mem_cgroup *memcg, int nid)
 {
 	struct pglist_data *pgdat = NODE_DATA(nid);
@@ -4287,140 +4284,6 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
 	return true;
 }
 
-/******************************************************************************
- *                          memcg LRU
- ******************************************************************************/
-
-/* see the comment on MEMCG_NR_GENS */
-enum {
-	MEMCG_LRU_NOP,
-	MEMCG_LRU_HEAD,
-	MEMCG_LRU_TAIL,
-	MEMCG_LRU_OLD,
-	MEMCG_LRU_YOUNG,
-};
-
-static void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
-{
-	int seg;
-	int old, new;
-	unsigned long flags;
-	int bin = get_random_u32_below(MEMCG_NR_BINS);
-	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
-
-	spin_lock_irqsave(&pgdat->memcg_lru.lock, flags);
-
-	VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
-
-	seg = 0;
-	new = old = lruvec->lrugen.gen;
-
-	/* see the comment on MEMCG_NR_GENS */
-	if (op == MEMCG_LRU_HEAD)
-		seg = MEMCG_LRU_HEAD;
-	else if (op == MEMCG_LRU_TAIL)
-		seg = MEMCG_LRU_TAIL;
-	else if (op == MEMCG_LRU_OLD)
-		new = get_memcg_gen(pgdat->memcg_lru.seq);
-	else if (op == MEMCG_LRU_YOUNG)
-		new = get_memcg_gen(pgdat->memcg_lru.seq + 1);
-	else
-		VM_WARN_ON_ONCE(true);
-
-	WRITE_ONCE(lruvec->lrugen.seg, seg);
-	WRITE_ONCE(lruvec->lrugen.gen, new);
-
-	hlist_nulls_del_rcu(&lruvec->lrugen.list);
-
-	if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD)
-		hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
-	else
-		hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
-
-	pgdat->memcg_lru.nr_memcgs[old]--;
-	pgdat->memcg_lru.nr_memcgs[new]++;
-
-	if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq))
-		WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
-
-	spin_unlock_irqrestore(&pgdat->memcg_lru.lock, flags);
-}
-
-#ifdef CONFIG_MEMCG
-
-void lru_gen_online_memcg(struct mem_cgroup *memcg)
-{
-	int gen;
-	int nid;
-	int bin = get_random_u32_below(MEMCG_NR_BINS);
-
-	for_each_node(nid) {
-		struct pglist_data *pgdat = NODE_DATA(nid);
-		struct lruvec *lruvec = get_lruvec(memcg, nid);
-
-		spin_lock_irq(&pgdat->memcg_lru.lock);
-
-		VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list));
-
-		gen = get_memcg_gen(pgdat->memcg_lru.seq);
-
-		lruvec->lrugen.gen = gen;
-
-		hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]);
-		pgdat->memcg_lru.nr_memcgs[gen]++;
-
-		spin_unlock_irq(&pgdat->memcg_lru.lock);
-	}
-}
-
-void lru_gen_offline_memcg(struct mem_cgroup *memcg)
-{
-	int nid;
-
-	for_each_node(nid) {
-		struct lruvec *lruvec = get_lruvec(memcg, nid);
-
-		lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD);
-	}
-}
-
-void lru_gen_release_memcg(struct mem_cgroup *memcg)
-{
-	int gen;
-	int nid;
-
-	for_each_node(nid) {
-		struct pglist_data *pgdat = NODE_DATA(nid);
-		struct lruvec *lruvec = get_lruvec(memcg, nid);
-
-		spin_lock_irq(&pgdat->memcg_lru.lock);
-
-		if (hlist_nulls_unhashed(&lruvec->lrugen.list))
-			goto unlock;
-
-		gen = lruvec->lrugen.gen;
-
-		hlist_nulls_del_init_rcu(&lruvec->lrugen.list);
-		pgdat->memcg_lru.nr_memcgs[gen]--;
-
-		if (!pgdat->memcg_lru.nr_memcgs[gen] && gen == get_memcg_gen(pgdat->memcg_lru.seq))
-			WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
-unlock:
-		spin_unlock_irq(&pgdat->memcg_lru.lock);
-	}
-}
-
-void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid)
-{
-	struct lruvec *lruvec = get_lruvec(memcg, nid);
-
-	/* see the comment on MEMCG_NR_GENS */
-	if (READ_ONCE(lruvec->lrugen.seg) != MEMCG_LRU_HEAD)
-		lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD);
-}
-
-#endif /* CONFIG_MEMCG */
-
 /******************************************************************************
  *                          the eviction
  ******************************************************************************/
@@ -5613,18 +5476,6 @@ static const struct file_operations lru_gen_ro_fops = {
  *                          initialization
  ******************************************************************************/
 
-void lru_gen_init_pgdat(struct pglist_data *pgdat)
-{
-	int i, j;
-
-	spin_lock_init(&pgdat->memcg_lru.lock);
-
-	for (i = 0; i < MEMCG_NR_GENS; i++) {
-		for (j = 0; j < MEMCG_NR_BINS; j++)
-			INIT_HLIST_NULLS_HEAD(&pgdat->memcg_lru.fifo[i][j], i);
-	}
-}
-
 void lru_gen_init_lruvec(struct lruvec *lruvec)
 {
 	int i;
@@ -5671,9 +5522,7 @@ void lru_gen_exit_memcg(struct mem_cgroup *memcg)
 		struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
 
 		VM_WARN_ON_ONCE(memchr_inv(lruvec->lrugen.nr_pages, 0,
-					   sizeof(lruvec->lrugen.nr_pages)));
-
-		lruvec->lrugen.list.next = LIST_POISON1;
+				   sizeof(lruvec->lrugen.nr_pages)));
 
 		if (!mm_state)
 			continue;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen
  2025-12-09  1:25 [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
  2025-12-09  1:25 ` [PATCH -next 1/5] mm/mglru: use mem_cgroup_iter for global reclaim Chen Ridong
  2025-12-09  1:25 ` [PATCH -next 2/5] mm/mglru: remove memcg lru Chen Ridong
@ 2025-12-09  1:25 ` Chen Ridong
  2025-12-12  2:55   ` kernel test robot
  2025-12-09  1:25 ` [PATCH -next 4/5] mm/mglru: combine shrink_many into shrink_node_memcgs Chen Ridong
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 9+ messages in thread
From: Chen Ridong @ 2025-12-09  1:25 UTC (permalink / raw)
  To: akpm, axelrasmussen, yuanchu, weixugc, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, hannes,
	roman.gushchin, shakeel.butt, muchun.song, zhengqi.arch
  Cc: linux-mm, linux-doc, linux-kernel, cgroups, lujialin4, chenridong,
	zhongjinji

From: Chen Ridong <chenridong@huawei.com>

Currently, flush_reclaim_state is placed differently between
shrink_node_memcgs and shrink_many. shrink_many (only used for gen-LRU)
calls it after each lruvec is shrunk, while shrink_node_memcgs calls it
only after all lruvecs have been shrunk.

This patch moves flush_reclaim_state into shrink_node_memcgs and calls it
after each lruvec. This unifies the behavior and is reasonable because:

1. flush_reclaim_state adds current->reclaim_state->reclaimed to
   sc->nr_reclaimed.
2. For non-MGLRU root reclaim, this can help stop the iteration earlier
   when nr_to_reclaim is reached.
3. For non-root reclaim, the effect is negligible since flush_reclaim_state
   does nothing in that case.

After moving flush_reclaim_state into shrink_node_memcgs, shrink_one can be
extended to support both lrugen and non-lrugen paths. It will call
try_to_shrink_lruvec for lrugen root reclaim and shrink_lruvec otherwise.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 mm/vmscan.c | 57 +++++++++++++++++++++--------------------------------
 1 file changed, 23 insertions(+), 34 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 584f41eb4c14..795f5ebd9341 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4758,23 +4758,7 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
 	return nr_to_scan < 0;
 }
 
-static void shrink_one(struct lruvec *lruvec, struct scan_control *sc)
-{
-	unsigned long scanned = sc->nr_scanned;
-	unsigned long reclaimed = sc->nr_reclaimed;
-	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
-	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
-
-	try_to_shrink_lruvec(lruvec, sc);
-
-	shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority);
-
-	if (!sc->proactive)
-		vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,
-			   sc->nr_reclaimed - reclaimed);
-
-	flush_reclaim_state(sc);
-}
+static void shrink_one(struct lruvec *lruvec, struct scan_control *sc);
 
 static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
 {
@@ -5760,6 +5744,27 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 	return inactive_lru_pages > pages_for_compaction;
 }
 
+static void shrink_one(struct lruvec *lruvec, struct scan_control *sc)
+{
+	unsigned long scanned = sc->nr_scanned;
+	unsigned long reclaimed = sc->nr_reclaimed;
+	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+
+	if (lru_gen_enabled() && root_reclaim(sc))
+		try_to_shrink_lruvec(lruvec, sc);
+	else
+		shrink_lruvec(lruvec, sc);
+
+	shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority);
+
+	if (!sc->proactive)
+		vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,
+			   sc->nr_reclaimed - reclaimed);
+
+	flush_reclaim_state(sc);
+}
+
 static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
 {
 	struct mem_cgroup *target_memcg = sc->target_mem_cgroup;
@@ -5784,8 +5789,6 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
 	memcg = mem_cgroup_iter(target_memcg, NULL, partial);
 	do {
 		struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
-		unsigned long reclaimed;
-		unsigned long scanned;
 
 		/*
 		 * This loop can become CPU-bound when target memcgs
@@ -5817,19 +5820,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
 			memcg_memory_event(memcg, MEMCG_LOW);
 		}
 
-		reclaimed = sc->nr_reclaimed;
-		scanned = sc->nr_scanned;
-
-		shrink_lruvec(lruvec, sc);
-
-		shrink_slab(sc->gfp_mask, pgdat->node_id, memcg,
-			    sc->priority);
-
-		/* Record the group's reclaim efficiency */
-		if (!sc->proactive)
-			vmpressure(sc->gfp_mask, memcg, false,
-				   sc->nr_scanned - scanned,
-				   sc->nr_reclaimed - reclaimed);
+		shrink_one(lruvec, sc);
 
 		/* If partial walks are allowed, bail once goal is reached */
 		if (partial && sc->nr_reclaimed >= sc->nr_to_reclaim) {
@@ -5863,8 +5854,6 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 
 	shrink_node_memcgs(pgdat, sc);
 
-	flush_reclaim_state(sc);
-
 	nr_node_reclaimed = sc->nr_reclaimed - nr_reclaimed;
 
 	/* Record the subtree's reclaim efficiency */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen
  2025-12-09  1:25 ` [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen Chen Ridong
@ 2025-12-12  2:55   ` kernel test robot
  2025-12-12  9:53     ` Chen Ridong
  0 siblings, 1 reply; 9+ messages in thread
From: kernel test robot @ 2025-12-12  2:55 UTC (permalink / raw)
  To: Chen Ridong, akpm, axelrasmussen, yuanchu, weixugc, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	corbet, hannes, roman.gushchin, shakeel.butt, muchun.song,
	zhengqi.arch
  Cc: llvm, oe-kbuild-all, linux-mm, linux-doc, linux-kernel, cgroups,
	lujialin4, chenridong, zhongjinji

Hi Chen,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Chen-Ridong/mm-mglru-use-mem_cgroup_iter-for-global-reclaim/20251209-094913
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20251209012557.1949239-4-chenridong%40huaweicloud.com
patch subject: [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen
config: x86_64-randconfig-004-20251212 (https://download.01.org/0day-ci/archive/20251212/202512121027.03z9qd08-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251212/202512121027.03z9qd08-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512121027.03z9qd08-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> mm/vmscan.o: warning: objtool: shrink_one+0xeb2: sibling call from callable instruction with modified stack frame

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen
  2025-12-12  2:55   ` kernel test robot
@ 2025-12-12  9:53     ` Chen Ridong
  0 siblings, 0 replies; 9+ messages in thread
From: Chen Ridong @ 2025-12-12  9:53 UTC (permalink / raw)
  To: kernel test robot, akpm, axelrasmussen, yuanchu, weixugc, david,
	lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
	corbet, hannes, roman.gushchin, shakeel.butt, muchun.song,
	zhengqi.arch
  Cc: llvm, oe-kbuild-all, linux-mm, linux-doc, linux-kernel, cgroups,
	lujialin4, zhongjinji



On 2025/12/12 10:55, kernel test robot wrote:
> Hi Chen,
> 
> kernel test robot noticed the following build warnings:
> 
> [auto build test WARNING on akpm-mm/mm-everything]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Chen-Ridong/mm-mglru-use-mem_cgroup_iter-for-global-reclaim/20251209-094913
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link:    https://lore.kernel.org/r/20251209012557.1949239-4-chenridong%40huaweicloud.com
> patch subject: [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen
> config: x86_64-randconfig-004-20251212 (https://download.01.org/0day-ci/archive/20251212/202512121027.03z9qd08-lkp@intel.com/config)
> compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251212/202512121027.03z9qd08-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202512121027.03z9qd08-lkp@intel.com/
> 
> All warnings (new ones prefixed by >>):
> 
>>> mm/vmscan.o: warning: objtool: shrink_one+0xeb2: sibling call from callable instruction with modified stack frame
> 

This is the first time I've encountered this warning. While adding
`STACK_FRAME_NON_STANDARD(shrink_one)` resolves it, I noticed this approach isn't widely used in the
codebase. Is this the standard solution, or are there better alternatives?

I've tested that the warning persists even when `shrink_one` is simplified to only call `shrink_lruvec`:

```
static void shrink_one(struct lruvec *lruvec, struct scan_control *sc)
{
    shrink_lruvec(lruvec, sc);
}
```

How can we properly avoid this warning without using STACK_FRAME_NON_STANDARD?

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH -next 4/5] mm/mglru: combine shrink_many into shrink_node_memcgs
  2025-12-09  1:25 [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
                   ` (2 preceding siblings ...)
  2025-12-09  1:25 ` [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen Chen Ridong
@ 2025-12-09  1:25 ` Chen Ridong
  2025-12-09  1:25 ` [PATCH -next 5/5] mm/mglru: factor lrugen state out of shrink_lruvec Chen Ridong
  2025-12-12 10:15 ` [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
  5 siblings, 0 replies; 9+ messages in thread
From: Chen Ridong @ 2025-12-09  1:25 UTC (permalink / raw)
  To: akpm, axelrasmussen, yuanchu, weixugc, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, hannes,
	roman.gushchin, shakeel.butt, muchun.song, zhengqi.arch
  Cc: linux-mm, linux-doc, linux-kernel, cgroups, lujialin4, chenridong,
	zhongjinji

From: Chen Ridong <chenridong@huawei.com>

The previous patch extended shrink_one to support both lrugen and
non-lrugen reclaim. Now shrink_many and shrink_node_memcgs are almost
identical, except that shrink_many also calls should_abort_scan for lrugen
root reclaim.

This patch adds the should_abort_scan check to shrink_node_memcgs (which is
only meaningful for gen-LRU root reclaim). After this change,
shrink_node_memcgs can be used directly instead of shrink_many, allowing
shrink_many to be safely removed.

Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 mm/vmscan.c | 67 ++++++++++++-----------------------------------------
 1 file changed, 15 insertions(+), 52 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 795f5ebd9341..dbf2cfbe3243 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4758,57 +4758,6 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
 	return nr_to_scan < 0;
 }
 
-static void shrink_one(struct lruvec *lruvec, struct scan_control *sc);
-
-static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
-{
-	struct mem_cgroup *target = sc->target_mem_cgroup;
-	struct mem_cgroup_reclaim_cookie reclaim = {
-		.pgdat = pgdat,
-	};
-	struct mem_cgroup_reclaim_cookie *cookie = &reclaim;
-	struct mem_cgroup *memcg;
-
-	if (current_is_kswapd() || sc->memcg_full_walk)
-		cookie = NULL;
-
-	memcg = mem_cgroup_iter(target, NULL, cookie);
-	while (memcg) {
-		struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
-
-		cond_resched();
-
-		mem_cgroup_calculate_protection(target, memcg);
-
-		if (mem_cgroup_below_min(target, memcg))
-			goto next;
-
-		if (mem_cgroup_below_low(target, memcg)) {
-			if (!sc->memcg_low_reclaim) {
-				sc->memcg_low_skipped = 1;
-				goto next;
-			}
-			memcg_memory_event(memcg, MEMCG_LOW);
-		}
-
-		shrink_one(lruvec, sc);
-
-		if (should_abort_scan(lruvec, sc)) {
-			if (cookie)
-				mem_cgroup_iter_break(target, memcg);
-			break;
-		}
-
-next:
-		if (cookie && sc->nr_reclaimed >= sc->nr_to_reclaim) {
-			mem_cgroup_iter_break(target, memcg);
-			break;
-		}
-
-		memcg = mem_cgroup_iter(target, memcg, cookie);
-	}
-}
-
 static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
 {
 	struct blk_plug plug;
@@ -4829,6 +4778,9 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
 	blk_finish_plug(&plug);
 }
 
+static void shrink_one(struct lruvec *lruvec, struct scan_control *sc);
+static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc);
+
 static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *sc)
 {
 	struct blk_plug plug;
@@ -4858,7 +4810,7 @@ static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *
 	if (mem_cgroup_disabled())
 		shrink_one(&pgdat->__lruvec, sc);
 	else
-		shrink_many(pgdat, sc);
+		shrink_node_memcgs(pgdat, sc);
 
 	if (current_is_kswapd())
 		sc->nr_reclaimed += reclaimed;
@@ -5554,6 +5506,11 @@ static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *
 	BUILD_BUG();
 }
 
+static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *sc)
+{
+	return false;
+}
+
 #endif /* CONFIG_LRU_GEN */
 
 static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
@@ -5822,6 +5779,12 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
 
 		shrink_one(lruvec, sc);
 
+		if (should_abort_scan(lruvec, sc)) {
+			if (partial)
+				mem_cgroup_iter_break(target_memcg, memcg);
+			break;
+		}
+
 		/* If partial walks are allowed, bail once goal is reached */
 		if (partial && sc->nr_reclaimed >= sc->nr_to_reclaim) {
 			mem_cgroup_iter_break(target_memcg, memcg);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH -next 5/5] mm/mglru: factor lrugen state out of shrink_lruvec
  2025-12-09  1:25 [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
                   ` (3 preceding siblings ...)
  2025-12-09  1:25 ` [PATCH -next 4/5] mm/mglru: combine shrink_many into shrink_node_memcgs Chen Ridong
@ 2025-12-09  1:25 ` Chen Ridong
  2025-12-12 10:15 ` [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
  5 siblings, 0 replies; 9+ messages in thread
From: Chen Ridong @ 2025-12-09  1:25 UTC (permalink / raw)
  To: akpm, axelrasmussen, yuanchu, weixugc, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, hannes,
	roman.gushchin, shakeel.butt, muchun.song, zhengqi.arch
  Cc: linux-mm, linux-doc, linux-kernel, cgroups, lujialin4, chenridong,
	zhongjinji

From: Chen Ridong <chenridong@huawei.com>

A previous patch updated shrink_node_memcgs to handle lrugen root reclaim
and extended shrink_one to support both lrugen and non-lrugen. However,
in shrink_one, lrugen non-root reclaim still invokes shrink_lruvec, which
should only be used for non-lrugen reclaim.

To clarify the semantics, this patch moves the lrugen-specific logic out of
shrink_lruvec, leaving shrink_lruvec exclusively for non-lrugen reclaim.

Now for lrugen, shrink_one invokes lru_gen_shrink_lruvec, which calls
try_to_shrink_lruvec directly, without extra handling for root reclaim, as
that processing is already done in lru_gen_shrink_node. Non-root reclaim
behavior remains unchanged.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 mm/vmscan.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index dbf2cfbe3243..c5f517ec52a7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4762,7 +4762,12 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
 {
 	struct blk_plug plug;
 
-	VM_WARN_ON_ONCE(root_reclaim(sc));
+	/* Root reclaim has finished other extra work outside, just shrink. */
+	if (root_reclaim(sc)) {
+		try_to_shrink_lruvec(lruvec, sc);
+		return;
+	}
+
 	VM_WARN_ON_ONCE(!sc->may_writepage || !sc->may_unmap);
 
 	lru_add_drain();
@@ -5524,11 +5529,6 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
 	bool proportional_reclaim;
 	struct blk_plug plug;
 
-	if (lru_gen_enabled() && !root_reclaim(sc)) {
-		lru_gen_shrink_lruvec(lruvec, sc);
-		return;
-	}
-
 	get_scan_count(lruvec, sc, nr);
 
 	/* Record the original scan target for proportional adjustments later */
@@ -5708,8 +5708,8 @@ static void shrink_one(struct lruvec *lruvec, struct scan_control *sc)
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 
-	if (lru_gen_enabled() && root_reclaim(sc))
-		try_to_shrink_lruvec(lruvec, sc);
+	if (lru_gen_enabled())
+		lru_gen_shrink_lruvec(lruvec, sc);
 	else
 		shrink_lruvec(lruvec, sc);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH -next 0/5] mm/mglru: remove memcg lru
  2025-12-09  1:25 [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
                   ` (4 preceding siblings ...)
  2025-12-09  1:25 ` [PATCH -next 5/5] mm/mglru: factor lrugen state out of shrink_lruvec Chen Ridong
@ 2025-12-12 10:15 ` Chen Ridong
  5 siblings, 0 replies; 9+ messages in thread
From: Chen Ridong @ 2025-12-12 10:15 UTC (permalink / raw)
  To: akpm, axelrasmussen, yuanchu, weixugc, david, lorenzo.stoakes,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, hannes,
	roman.gushchin, shakeel.butt, muchun.song, zhengqi.arch
  Cc: linux-mm, linux-doc, linux-kernel, cgroups, lujialin4, zhongjinji



On 2025/12/9 9:25, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> The memcg LRU was introduced to improve scalability in global reclaim,
> but its implementation has grown complex and can cause performance
> regressions when creating many memory cgroups [1].
> 
> This series implements mem_cgroup_iter with a reclaim cookie in
> shrink_many() for global reclaim, following the pattern already used in
> shrink_node_memcgs(), an approach suggested by Johannes [1]. The new
> design maintains good fairness across cgroups by preserving iteration
> state between reclaim passes.
> 
> Testing was performed using the original stress test from Yu Zhao [2] on a
> 1 TB, 4-node NUMA system. The results show:
> 
>     pgsteal:
>                                         memcg LRU    memcg iter
>     stddev(pgsteal) / mean(pgsteal)     106.03%       93.20%
>     sum(pgsteal) / sum(requested)        98.10%       99.28%
>     
>     workingset_refault_anon:
>                                         memcg LRU    memcg iter
>     stddev(refault) / mean(refault)     193.97%      134.67%
>     sum(refault)                       1,963,229    2,027,567
> 
> The new implementation shows clear fairness improvements, reducing the
> standard deviation relative to the mean by 12.8 percentage points for
> pgsteal and bringing the pgsteal ratio closer to 100%. Refault counts
> increased by 3.2% (from 1,963,229 to 2,027,567).
> 
> To simplify review:
> 1. Patch 1 uses mem_cgroup_iter with reclaim cookie in shrink_many()
> 2. Patch 2 removes the now-unused memcg LRU code
> 3. Patches 3–5 combine shrink_many and shrink_node_memcgs
>    (This reorganization is clearer after switching to mem_cgroup_iter)
> 
> ---
> 
> Changes from RFC series:
> 1. Updated the test result data.
> 2. Added patches 3–5 to combine shrink_many and shrink_node_memcgs.
> 
> RFC: https://lore.kernel.org/all/20251204123124.1822965-1-chenridong@huaweicloud.com/
> 
> Chen Ridong (5):
>   mm/mglru: use mem_cgroup_iter for global reclaim
>   mm/mglru: remove memcg lru
>   mm/mglru: extend shrink_one for both lrugen and non-lrugen
>   mm/mglru: combine shrink_many into shrink_node_memcgs
>   mm/mglru: factor lrugen state out of shrink_lruvec
> 
>  Documentation/mm/multigen_lru.rst |  30 ---
>  include/linux/mmzone.h            |  89 --------
>  mm/memcontrol-v1.c                |   6 -
>  mm/memcontrol.c                   |   4 -
>  mm/mm_init.c                      |   1 -
>  mm/vmscan.c                       | 332 ++++--------------------------
>  6 files changed, 44 insertions(+), 418 deletions(-)
> 

Hello all,

There's a warning from the kernel test robot, and I would like to update the series to fix it along
with any feedback from your reviews.

I'd appreciate it if you could take a look at this patch series when convenient.

Hi Shakeel, I would be very grateful if you could review patches 3-5. They combine shrink_many and
shrink_node_memcgs as you suggested — does that look good to you?

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-12-12 10:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-09  1:25 [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
2025-12-09  1:25 ` [PATCH -next 1/5] mm/mglru: use mem_cgroup_iter for global reclaim Chen Ridong
2025-12-09  1:25 ` [PATCH -next 2/5] mm/mglru: remove memcg lru Chen Ridong
2025-12-09  1:25 ` [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen Chen Ridong
2025-12-12  2:55   ` kernel test robot
2025-12-12  9:53     ` Chen Ridong
2025-12-09  1:25 ` [PATCH -next 4/5] mm/mglru: combine shrink_many into shrink_node_memcgs Chen Ridong
2025-12-09  1:25 ` [PATCH -next 5/5] mm/mglru: factor lrugen state out of shrink_lruvec Chen Ridong
2025-12-12 10:15 ` [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).