* [PATCH -next 1/5] mm/mglru: use mem_cgroup_iter for global reclaim
2025-12-09 1:25 [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
@ 2025-12-09 1:25 ` Chen Ridong
2025-12-09 1:25 ` [PATCH -next 2/5] mm/mglru: remove memcg lru Chen Ridong
` (4 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Chen Ridong @ 2025-12-09 1:25 UTC (permalink / raw)
To: akpm, axelrasmussen, yuanchu, weixugc, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, hannes,
roman.gushchin, shakeel.butt, muchun.song, zhengqi.arch
Cc: linux-mm, linux-doc, linux-kernel, cgroups, lujialin4, chenridong,
zhongjinji
From: Chen Ridong <chenridong@huawei.com>
The memcg LRU was originally introduced for global reclaim to enhance
scalability. However, its implementation complexity has led to performance
regressions when dealing with a large number of memory cgroups [1].
As suggested by Johannes [1], this patch adopts mem_cgroup_iter with
cookie-based iteration for global reclaim, aligning with the approach
already used in shrink_node_memcgs. This simplification removes the
dedicated memcg LRU tracking while maintaining the core functionality.
It performed a stress test based on Yu Zhao's methodology [2] on a
1 TB, 4-node NUMA system. The results are summarized below:
pgsteal:
memcg LRU memcg iter
stddev(pgsteal) / mean(pgsteal) 106.03% 93.20%
sum(pgsteal) / sum(requested) 98.10% 99.28%
workingset_refault_anon:
memcg LRU memcg iter
stddev(refault) / mean(refault) 193.97% 134.67%
sum(refault) 1963229 2027567
The new implementation shows a clear fairness improvement, reducing the
standard deviation relative to the mean by 12.8 percentage points. The
pgsteal ratio is also closer to 100%. Refault counts increased by 3.2%
(from 1,963,229 to 2,027,567).
The primary benefits of this change are:
1. Simplified codebase by removing custom memcg LRU infrastructure
2. Improved fairness in memory reclaim across multiple cgroups
3. Better performance when creating many memory cgroups
[1] https://lore.kernel.org/r/20251126171513.GC135004@cmpxchg.org
[2] https://lore.kernel.org/r/20221222041905.2431096-7-yuzhao@google.com
Suggested-by: Johannes Weiner <hannes@cmxpchg.org>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Acked-by: Johannes Weiner <hannes@cmxpchg.org>
---
mm/vmscan.c | 117 ++++++++++++++++------------------------------------
1 file changed, 36 insertions(+), 81 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index fddd168a9737..70b0e7e5393c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4895,27 +4895,14 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
return nr_to_scan < 0;
}
-static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
+static void shrink_one(struct lruvec *lruvec, struct scan_control *sc)
{
- bool success;
unsigned long scanned = sc->nr_scanned;
unsigned long reclaimed = sc->nr_reclaimed;
- struct mem_cgroup *memcg = lruvec_memcg(lruvec);
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+ struct mem_cgroup *memcg = lruvec_memcg(lruvec);
- /* lru_gen_age_node() called mem_cgroup_calculate_protection() */
- if (mem_cgroup_below_min(NULL, memcg))
- return MEMCG_LRU_YOUNG;
-
- if (mem_cgroup_below_low(NULL, memcg)) {
- /* see the comment on MEMCG_NR_GENS */
- if (READ_ONCE(lruvec->lrugen.seg) != MEMCG_LRU_TAIL)
- return MEMCG_LRU_TAIL;
-
- memcg_memory_event(memcg, MEMCG_LOW);
- }
-
- success = try_to_shrink_lruvec(lruvec, sc);
+ try_to_shrink_lruvec(lruvec, sc);
shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority);
@@ -4924,86 +4911,55 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
sc->nr_reclaimed - reclaimed);
flush_reclaim_state(sc);
-
- if (success && mem_cgroup_online(memcg))
- return MEMCG_LRU_YOUNG;
-
- if (!success && lruvec_is_sizable(lruvec, sc))
- return 0;
-
- /* one retry if offlined or too small */
- return READ_ONCE(lruvec->lrugen.seg) != MEMCG_LRU_TAIL ?
- MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG;
}
static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
{
- int op;
- int gen;
- int bin;
- int first_bin;
- struct lruvec *lruvec;
- struct lru_gen_folio *lrugen;
+ struct mem_cgroup *target = sc->target_mem_cgroup;
+ struct mem_cgroup_reclaim_cookie reclaim = {
+ .pgdat = pgdat,
+ };
+ struct mem_cgroup_reclaim_cookie *cookie = &reclaim;
struct mem_cgroup *memcg;
- struct hlist_nulls_node *pos;
- gen = get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq));
- bin = first_bin = get_random_u32_below(MEMCG_NR_BINS);
-restart:
- op = 0;
- memcg = NULL;
-
- rcu_read_lock();
+ if (current_is_kswapd() || sc->memcg_full_walk)
+ cookie = NULL;
- hlist_nulls_for_each_entry_rcu(lrugen, pos, &pgdat->memcg_lru.fifo[gen][bin], list) {
- if (op) {
- lru_gen_rotate_memcg(lruvec, op);
- op = 0;
- }
+ memcg = mem_cgroup_iter(target, NULL, cookie);
+ while (memcg) {
+ struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
- mem_cgroup_put(memcg);
- memcg = NULL;
+ cond_resched();
- if (gen != READ_ONCE(lrugen->gen))
- continue;
+ mem_cgroup_calculate_protection(target, memcg);
- lruvec = container_of(lrugen, struct lruvec, lrugen);
- memcg = lruvec_memcg(lruvec);
+ if (mem_cgroup_below_min(target, memcg))
+ goto next;
- if (!mem_cgroup_tryget(memcg)) {
- lru_gen_release_memcg(memcg);
- memcg = NULL;
- continue;
+ if (mem_cgroup_below_low(target, memcg)) {
+ if (!sc->memcg_low_reclaim) {
+ sc->memcg_low_skipped = 1;
+ goto next;
+ }
+ memcg_memory_event(memcg, MEMCG_LOW);
}
- rcu_read_unlock();
+ shrink_one(lruvec, sc);
- op = shrink_one(lruvec, sc);
-
- rcu_read_lock();
-
- if (should_abort_scan(lruvec, sc))
+ if (should_abort_scan(lruvec, sc)) {
+ if (cookie)
+ mem_cgroup_iter_break(target, memcg);
break;
- }
-
- rcu_read_unlock();
-
- if (op)
- lru_gen_rotate_memcg(lruvec, op);
-
- mem_cgroup_put(memcg);
-
- if (!is_a_nulls(pos))
- return;
+ }
- /* restart if raced with lru_gen_rotate_memcg() */
- if (gen != get_nulls_value(pos))
- goto restart;
+next:
+ if (cookie && sc->nr_reclaimed >= sc->nr_to_reclaim) {
+ mem_cgroup_iter_break(target, memcg);
+ break;
+ }
- /* try the rest of the bins of the current generation */
- bin = get_memcg_bin(bin + 1);
- if (bin != first_bin)
- goto restart;
+ memcg = mem_cgroup_iter(target, memcg, cookie);
+ }
}
static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
@@ -5019,8 +4975,7 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
set_mm_walk(NULL, sc->proactive);
- if (try_to_shrink_lruvec(lruvec, sc))
- lru_gen_rotate_memcg(lruvec, MEMCG_LRU_YOUNG);
+ try_to_shrink_lruvec(lruvec, sc);
clear_mm_walk();
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH -next 2/5] mm/mglru: remove memcg lru
2025-12-09 1:25 [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
2025-12-09 1:25 ` [PATCH -next 1/5] mm/mglru: use mem_cgroup_iter for global reclaim Chen Ridong
@ 2025-12-09 1:25 ` Chen Ridong
2025-12-09 1:25 ` [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen Chen Ridong
` (3 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: Chen Ridong @ 2025-12-09 1:25 UTC (permalink / raw)
To: akpm, axelrasmussen, yuanchu, weixugc, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, hannes,
roman.gushchin, shakeel.butt, muchun.song, zhengqi.arch
Cc: linux-mm, linux-doc, linux-kernel, cgroups, lujialin4, chenridong,
zhongjinji
From: Chen Ridong <chenridong@huawei.com>
Now that the previous patch has switched global reclaim to use
mem_cgroup_iter, the specialized memcg LRU infrastructure is no longer
needed. This patch removes all related code:
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
Documentation/mm/multigen_lru.rst | 30 ------
include/linux/mmzone.h | 89 -----------------
mm/memcontrol-v1.c | 6 --
mm/memcontrol.c | 4 -
mm/mm_init.c | 1 -
mm/vmscan.c | 153 +-----------------------------
6 files changed, 1 insertion(+), 282 deletions(-)
diff --git a/Documentation/mm/multigen_lru.rst b/Documentation/mm/multigen_lru.rst
index 52ed5092022f..bf8547e2f592 100644
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -220,36 +220,6 @@ time domain because a CPU can scan pages at different rates under
varying memory pressure. It calculates a moving average for each new
generation to avoid being permanently locked in a suboptimal state.
-Memcg LRU
----------
-An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs,
-since each node and memcg combination has an LRU of folios (see
-``mem_cgroup_lruvec()``). Its goal is to improve the scalability of
-global reclaim, which is critical to system-wide memory overcommit in
-data centers. Note that memcg LRU only applies to global reclaim.
-
-The basic structure of an memcg LRU can be understood by an analogy to
-the active/inactive LRU (of folios):
-
-1. It has the young and the old (generations), i.e., the counterparts
- to the active and the inactive;
-2. The increment of ``max_seq`` triggers promotion, i.e., the
- counterpart to activation;
-3. Other events trigger similar operations, e.g., offlining an memcg
- triggers demotion, i.e., the counterpart to deactivation.
-
-In terms of global reclaim, it has two distinct features:
-
-1. Sharding, which allows each thread to start at a random memcg (in
- the old generation) and improves parallelism;
-2. Eventual fairness, which allows direct reclaim to bail out at will
- and reduces latency without affecting fairness over some time.
-
-In terms of traversing memcgs during global reclaim, it improves the
-best-case complexity from O(n) to O(1) and does not affect the
-worst-case complexity O(n). Therefore, on average, it has a sublinear
-complexity.
-
Summary
-------
The multi-gen LRU (of folios) can be disassembled into the following
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 75ef7c9f9307..49952301ff3b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -509,12 +509,6 @@ struct lru_gen_folio {
atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS];
/* whether the multi-gen LRU is enabled */
bool enabled;
- /* the memcg generation this lru_gen_folio belongs to */
- u8 gen;
- /* the list segment this lru_gen_folio belongs to */
- u8 seg;
- /* per-node lru_gen_folio list for global reclaim */
- struct hlist_nulls_node list;
};
enum {
@@ -558,79 +552,14 @@ struct lru_gen_mm_walk {
bool force_scan;
};
-/*
- * For each node, memcgs are divided into two generations: the old and the
- * young. For each generation, memcgs are randomly sharded into multiple bins
- * to improve scalability. For each bin, the hlist_nulls is virtually divided
- * into three segments: the head, the tail and the default.
- *
- * An onlining memcg is added to the tail of a random bin in the old generation.
- * The eviction starts at the head of a random bin in the old generation. The
- * per-node memcg generation counter, whose reminder (mod MEMCG_NR_GENS) indexes
- * the old generation, is incremented when all its bins become empty.
- *
- * There are four operations:
- * 1. MEMCG_LRU_HEAD, which moves a memcg to the head of a random bin in its
- * current generation (old or young) and updates its "seg" to "head";
- * 2. MEMCG_LRU_TAIL, which moves a memcg to the tail of a random bin in its
- * current generation (old or young) and updates its "seg" to "tail";
- * 3. MEMCG_LRU_OLD, which moves a memcg to the head of a random bin in the old
- * generation, updates its "gen" to "old" and resets its "seg" to "default";
- * 4. MEMCG_LRU_YOUNG, which moves a memcg to the tail of a random bin in the
- * young generation, updates its "gen" to "young" and resets its "seg" to
- * "default".
- *
- * The events that trigger the above operations are:
- * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD;
- * 2. The first attempt to reclaim a memcg below low, which triggers
- * MEMCG_LRU_TAIL;
- * 3. The first attempt to reclaim a memcg offlined or below reclaimable size
- * threshold, which triggers MEMCG_LRU_TAIL;
- * 4. The second attempt to reclaim a memcg offlined or below reclaimable size
- * threshold, which triggers MEMCG_LRU_YOUNG;
- * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YOUNG;
- * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_YOUNG;
- * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD.
- *
- * Notes:
- * 1. Memcg LRU only applies to global reclaim, and the round-robin incrementing
- * of their max_seq counters ensures the eventual fairness to all eligible
- * memcgs. For memcg reclaim, it still relies on mem_cgroup_iter().
- * 2. There are only two valid generations: old (seq) and young (seq+1).
- * MEMCG_NR_GENS is set to three so that when reading the generation counter
- * locklessly, a stale value (seq-1) does not wraparound to young.
- */
-#define MEMCG_NR_GENS 3
-#define MEMCG_NR_BINS 8
-
-struct lru_gen_memcg {
- /* the per-node memcg generation counter */
- unsigned long seq;
- /* each memcg has one lru_gen_folio per node */
- unsigned long nr_memcgs[MEMCG_NR_GENS];
- /* per-node lru_gen_folio list for global reclaim */
- struct hlist_nulls_head fifo[MEMCG_NR_GENS][MEMCG_NR_BINS];
- /* protects the above */
- spinlock_t lock;
-};
-
-void lru_gen_init_pgdat(struct pglist_data *pgdat);
void lru_gen_init_lruvec(struct lruvec *lruvec);
bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw);
void lru_gen_init_memcg(struct mem_cgroup *memcg);
void lru_gen_exit_memcg(struct mem_cgroup *memcg);
-void lru_gen_online_memcg(struct mem_cgroup *memcg);
-void lru_gen_offline_memcg(struct mem_cgroup *memcg);
-void lru_gen_release_memcg(struct mem_cgroup *memcg);
-void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid);
#else /* !CONFIG_LRU_GEN */
-static inline void lru_gen_init_pgdat(struct pglist_data *pgdat)
-{
-}
-
static inline void lru_gen_init_lruvec(struct lruvec *lruvec)
{
}
@@ -648,22 +577,6 @@ static inline void lru_gen_exit_memcg(struct mem_cgroup *memcg)
{
}
-static inline void lru_gen_online_memcg(struct mem_cgroup *memcg)
-{
-}
-
-static inline void lru_gen_offline_memcg(struct mem_cgroup *memcg)
-{
-}
-
-static inline void lru_gen_release_memcg(struct mem_cgroup *memcg)
-{
-}
-
-static inline void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid)
-{
-}
-
#endif /* CONFIG_LRU_GEN */
struct lruvec {
@@ -1503,8 +1416,6 @@ typedef struct pglist_data {
#ifdef CONFIG_LRU_GEN
/* kswap mm walk data */
struct lru_gen_mm_walk mm_walk;
- /* lru_gen_folio list */
- struct lru_gen_memcg memcg_lru;
#endif
CACHELINE_PADDING(_pad2_);
diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 6eed14bff742..8f41e72ae7f0 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -182,12 +182,6 @@ static void memcg1_update_tree(struct mem_cgroup *memcg, int nid)
struct mem_cgroup_per_node *mz;
struct mem_cgroup_tree_per_node *mctz;
- if (lru_gen_enabled()) {
- if (soft_limit_excess(memcg))
- lru_gen_soft_reclaim(memcg, nid);
- return;
- }
-
mctz = soft_limit_tree.rb_tree_per_node[nid];
if (!mctz)
return;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index be810c1fbfc3..ab3ebecb5ec7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3874,8 +3874,6 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
if (unlikely(mem_cgroup_is_root(memcg)) && !mem_cgroup_disabled())
queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
FLUSH_TIME);
- lru_gen_online_memcg(memcg);
-
/* Online state pins memcg ID, memcg ID pins CSS */
refcount_set(&memcg->id.ref, 1);
css_get(css);
@@ -3915,7 +3913,6 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
reparent_deferred_split_queue(memcg);
reparent_shrinker_deferred(memcg);
wb_memcg_offline(memcg);
- lru_gen_offline_memcg(memcg);
drain_all_stock(memcg);
@@ -3927,7 +3924,6 @@ static void mem_cgroup_css_released(struct cgroup_subsys_state *css)
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
invalidate_reclaim_iterators(memcg);
- lru_gen_release_memcg(memcg);
}
static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
diff --git a/mm/mm_init.c b/mm/mm_init.c
index fc2a6f1e518f..6e5e1fe6ff31 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1745,7 +1745,6 @@ static void __init free_area_init_node(int nid)
pgdat_set_deferred_range(pgdat);
free_area_init_core(pgdat);
- lru_gen_init_pgdat(pgdat);
}
/* Any regular or high memory on that node ? */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 70b0e7e5393c..584f41eb4c14 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2698,9 +2698,6 @@ static bool should_clear_pmd_young(void)
#define for_each_evictable_type(type, swappiness) \
for ((type) = min_type(swappiness); (type) <= max_type(swappiness); (type)++)
-#define get_memcg_gen(seq) ((seq) % MEMCG_NR_GENS)
-#define get_memcg_bin(bin) ((bin) % MEMCG_NR_BINS)
-
static struct lruvec *get_lruvec(struct mem_cgroup *memcg, int nid)
{
struct pglist_data *pgdat = NODE_DATA(nid);
@@ -4287,140 +4284,6 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
return true;
}
-/******************************************************************************
- * memcg LRU
- ******************************************************************************/
-
-/* see the comment on MEMCG_NR_GENS */
-enum {
- MEMCG_LRU_NOP,
- MEMCG_LRU_HEAD,
- MEMCG_LRU_TAIL,
- MEMCG_LRU_OLD,
- MEMCG_LRU_YOUNG,
-};
-
-static void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
-{
- int seg;
- int old, new;
- unsigned long flags;
- int bin = get_random_u32_below(MEMCG_NR_BINS);
- struct pglist_data *pgdat = lruvec_pgdat(lruvec);
-
- spin_lock_irqsave(&pgdat->memcg_lru.lock, flags);
-
- VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
-
- seg = 0;
- new = old = lruvec->lrugen.gen;
-
- /* see the comment on MEMCG_NR_GENS */
- if (op == MEMCG_LRU_HEAD)
- seg = MEMCG_LRU_HEAD;
- else if (op == MEMCG_LRU_TAIL)
- seg = MEMCG_LRU_TAIL;
- else if (op == MEMCG_LRU_OLD)
- new = get_memcg_gen(pgdat->memcg_lru.seq);
- else if (op == MEMCG_LRU_YOUNG)
- new = get_memcg_gen(pgdat->memcg_lru.seq + 1);
- else
- VM_WARN_ON_ONCE(true);
-
- WRITE_ONCE(lruvec->lrugen.seg, seg);
- WRITE_ONCE(lruvec->lrugen.gen, new);
-
- hlist_nulls_del_rcu(&lruvec->lrugen.list);
-
- if (op == MEMCG_LRU_HEAD || op == MEMCG_LRU_OLD)
- hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
- else
- hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[new][bin]);
-
- pgdat->memcg_lru.nr_memcgs[old]--;
- pgdat->memcg_lru.nr_memcgs[new]++;
-
- if (!pgdat->memcg_lru.nr_memcgs[old] && old == get_memcg_gen(pgdat->memcg_lru.seq))
- WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
-
- spin_unlock_irqrestore(&pgdat->memcg_lru.lock, flags);
-}
-
-#ifdef CONFIG_MEMCG
-
-void lru_gen_online_memcg(struct mem_cgroup *memcg)
-{
- int gen;
- int nid;
- int bin = get_random_u32_below(MEMCG_NR_BINS);
-
- for_each_node(nid) {
- struct pglist_data *pgdat = NODE_DATA(nid);
- struct lruvec *lruvec = get_lruvec(memcg, nid);
-
- spin_lock_irq(&pgdat->memcg_lru.lock);
-
- VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list));
-
- gen = get_memcg_gen(pgdat->memcg_lru.seq);
-
- lruvec->lrugen.gen = gen;
-
- hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[gen][bin]);
- pgdat->memcg_lru.nr_memcgs[gen]++;
-
- spin_unlock_irq(&pgdat->memcg_lru.lock);
- }
-}
-
-void lru_gen_offline_memcg(struct mem_cgroup *memcg)
-{
- int nid;
-
- for_each_node(nid) {
- struct lruvec *lruvec = get_lruvec(memcg, nid);
-
- lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD);
- }
-}
-
-void lru_gen_release_memcg(struct mem_cgroup *memcg)
-{
- int gen;
- int nid;
-
- for_each_node(nid) {
- struct pglist_data *pgdat = NODE_DATA(nid);
- struct lruvec *lruvec = get_lruvec(memcg, nid);
-
- spin_lock_irq(&pgdat->memcg_lru.lock);
-
- if (hlist_nulls_unhashed(&lruvec->lrugen.list))
- goto unlock;
-
- gen = lruvec->lrugen.gen;
-
- hlist_nulls_del_init_rcu(&lruvec->lrugen.list);
- pgdat->memcg_lru.nr_memcgs[gen]--;
-
- if (!pgdat->memcg_lru.nr_memcgs[gen] && gen == get_memcg_gen(pgdat->memcg_lru.seq))
- WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
-unlock:
- spin_unlock_irq(&pgdat->memcg_lru.lock);
- }
-}
-
-void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid)
-{
- struct lruvec *lruvec = get_lruvec(memcg, nid);
-
- /* see the comment on MEMCG_NR_GENS */
- if (READ_ONCE(lruvec->lrugen.seg) != MEMCG_LRU_HEAD)
- lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD);
-}
-
-#endif /* CONFIG_MEMCG */
-
/******************************************************************************
* the eviction
******************************************************************************/
@@ -5613,18 +5476,6 @@ static const struct file_operations lru_gen_ro_fops = {
* initialization
******************************************************************************/
-void lru_gen_init_pgdat(struct pglist_data *pgdat)
-{
- int i, j;
-
- spin_lock_init(&pgdat->memcg_lru.lock);
-
- for (i = 0; i < MEMCG_NR_GENS; i++) {
- for (j = 0; j < MEMCG_NR_BINS; j++)
- INIT_HLIST_NULLS_HEAD(&pgdat->memcg_lru.fifo[i][j], i);
- }
-}
-
void lru_gen_init_lruvec(struct lruvec *lruvec)
{
int i;
@@ -5671,9 +5522,7 @@ void lru_gen_exit_memcg(struct mem_cgroup *memcg)
struct lru_gen_mm_state *mm_state = get_mm_state(lruvec);
VM_WARN_ON_ONCE(memchr_inv(lruvec->lrugen.nr_pages, 0,
- sizeof(lruvec->lrugen.nr_pages)));
-
- lruvec->lrugen.list.next = LIST_POISON1;
+ sizeof(lruvec->lrugen.nr_pages)));
if (!mm_state)
continue;
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen
2025-12-09 1:25 [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
2025-12-09 1:25 ` [PATCH -next 1/5] mm/mglru: use mem_cgroup_iter for global reclaim Chen Ridong
2025-12-09 1:25 ` [PATCH -next 2/5] mm/mglru: remove memcg lru Chen Ridong
@ 2025-12-09 1:25 ` Chen Ridong
2025-12-12 2:55 ` kernel test robot
2025-12-09 1:25 ` [PATCH -next 4/5] mm/mglru: combine shrink_many into shrink_node_memcgs Chen Ridong
` (2 subsequent siblings)
5 siblings, 1 reply; 9+ messages in thread
From: Chen Ridong @ 2025-12-09 1:25 UTC (permalink / raw)
To: akpm, axelrasmussen, yuanchu, weixugc, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, hannes,
roman.gushchin, shakeel.butt, muchun.song, zhengqi.arch
Cc: linux-mm, linux-doc, linux-kernel, cgroups, lujialin4, chenridong,
zhongjinji
From: Chen Ridong <chenridong@huawei.com>
Currently, flush_reclaim_state is placed differently between
shrink_node_memcgs and shrink_many. shrink_many (only used for gen-LRU)
calls it after each lruvec is shrunk, while shrink_node_memcgs calls it
only after all lruvecs have been shrunk.
This patch moves flush_reclaim_state into shrink_node_memcgs and calls it
after each lruvec. This unifies the behavior and is reasonable because:
1. flush_reclaim_state adds current->reclaim_state->reclaimed to
sc->nr_reclaimed.
2. For non-MGLRU root reclaim, this can help stop the iteration earlier
when nr_to_reclaim is reached.
3. For non-root reclaim, the effect is negligible since flush_reclaim_state
does nothing in that case.
After moving flush_reclaim_state into shrink_node_memcgs, shrink_one can be
extended to support both lrugen and non-lrugen paths. It will call
try_to_shrink_lruvec for lrugen root reclaim and shrink_lruvec otherwise.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
mm/vmscan.c | 57 +++++++++++++++++++++--------------------------------
1 file changed, 23 insertions(+), 34 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 584f41eb4c14..795f5ebd9341 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4758,23 +4758,7 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
return nr_to_scan < 0;
}
-static void shrink_one(struct lruvec *lruvec, struct scan_control *sc)
-{
- unsigned long scanned = sc->nr_scanned;
- unsigned long reclaimed = sc->nr_reclaimed;
- struct pglist_data *pgdat = lruvec_pgdat(lruvec);
- struct mem_cgroup *memcg = lruvec_memcg(lruvec);
-
- try_to_shrink_lruvec(lruvec, sc);
-
- shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority);
-
- if (!sc->proactive)
- vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,
- sc->nr_reclaimed - reclaimed);
-
- flush_reclaim_state(sc);
-}
+static void shrink_one(struct lruvec *lruvec, struct scan_control *sc);
static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
{
@@ -5760,6 +5744,27 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
return inactive_lru_pages > pages_for_compaction;
}
+static void shrink_one(struct lruvec *lruvec, struct scan_control *sc)
+{
+ unsigned long scanned = sc->nr_scanned;
+ unsigned long reclaimed = sc->nr_reclaimed;
+ struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+ struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+
+ if (lru_gen_enabled() && root_reclaim(sc))
+ try_to_shrink_lruvec(lruvec, sc);
+ else
+ shrink_lruvec(lruvec, sc);
+
+ shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority);
+
+ if (!sc->proactive)
+ vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,
+ sc->nr_reclaimed - reclaimed);
+
+ flush_reclaim_state(sc);
+}
+
static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
{
struct mem_cgroup *target_memcg = sc->target_mem_cgroup;
@@ -5784,8 +5789,6 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
memcg = mem_cgroup_iter(target_memcg, NULL, partial);
do {
struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
- unsigned long reclaimed;
- unsigned long scanned;
/*
* This loop can become CPU-bound when target memcgs
@@ -5817,19 +5820,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
memcg_memory_event(memcg, MEMCG_LOW);
}
- reclaimed = sc->nr_reclaimed;
- scanned = sc->nr_scanned;
-
- shrink_lruvec(lruvec, sc);
-
- shrink_slab(sc->gfp_mask, pgdat->node_id, memcg,
- sc->priority);
-
- /* Record the group's reclaim efficiency */
- if (!sc->proactive)
- vmpressure(sc->gfp_mask, memcg, false,
- sc->nr_scanned - scanned,
- sc->nr_reclaimed - reclaimed);
+ shrink_one(lruvec, sc);
/* If partial walks are allowed, bail once goal is reached */
if (partial && sc->nr_reclaimed >= sc->nr_to_reclaim) {
@@ -5863,8 +5854,6 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
shrink_node_memcgs(pgdat, sc);
- flush_reclaim_state(sc);
-
nr_node_reclaimed = sc->nr_reclaimed - nr_reclaimed;
/* Record the subtree's reclaim efficiency */
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen
2025-12-09 1:25 ` [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen Chen Ridong
@ 2025-12-12 2:55 ` kernel test robot
2025-12-12 9:53 ` Chen Ridong
0 siblings, 1 reply; 9+ messages in thread
From: kernel test robot @ 2025-12-12 2:55 UTC (permalink / raw)
To: Chen Ridong, akpm, axelrasmussen, yuanchu, weixugc, david,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
corbet, hannes, roman.gushchin, shakeel.butt, muchun.song,
zhengqi.arch
Cc: llvm, oe-kbuild-all, linux-mm, linux-doc, linux-kernel, cgroups,
lujialin4, chenridong, zhongjinji
Hi Chen,
kernel test robot noticed the following build warnings:
[auto build test WARNING on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/Chen-Ridong/mm-mglru-use-mem_cgroup_iter-for-global-reclaim/20251209-094913
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/20251209012557.1949239-4-chenridong%40huaweicloud.com
patch subject: [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen
config: x86_64-randconfig-004-20251212 (https://download.01.org/0day-ci/archive/20251212/202512121027.03z9qd08-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251212/202512121027.03z9qd08-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512121027.03z9qd08-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> mm/vmscan.o: warning: objtool: shrink_one+0xeb2: sibling call from callable instruction with modified stack frame
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen
2025-12-12 2:55 ` kernel test robot
@ 2025-12-12 9:53 ` Chen Ridong
0 siblings, 0 replies; 9+ messages in thread
From: Chen Ridong @ 2025-12-12 9:53 UTC (permalink / raw)
To: kernel test robot, akpm, axelrasmussen, yuanchu, weixugc, david,
lorenzo.stoakes, Liam.Howlett, vbabka, rppt, surenb, mhocko,
corbet, hannes, roman.gushchin, shakeel.butt, muchun.song,
zhengqi.arch
Cc: llvm, oe-kbuild-all, linux-mm, linux-doc, linux-kernel, cgroups,
lujialin4, zhongjinji
On 2025/12/12 10:55, kernel test robot wrote:
> Hi Chen,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on akpm-mm/mm-everything]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Chen-Ridong/mm-mglru-use-mem_cgroup_iter-for-global-reclaim/20251209-094913
> base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/r/20251209012557.1949239-4-chenridong%40huaweicloud.com
> patch subject: [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen
> config: x86_64-randconfig-004-20251212 (https://download.01.org/0day-ci/archive/20251212/202512121027.03z9qd08-lkp@intel.com/config)
> compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251212/202512121027.03z9qd08-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202512121027.03z9qd08-lkp@intel.com/
>
> All warnings (new ones prefixed by >>):
>
>>> mm/vmscan.o: warning: objtool: shrink_one+0xeb2: sibling call from callable instruction with modified stack frame
>
This is the first time I've encountered this warning. While adding
`STACK_FRAME_NON_STANDARD(shrink_one)` resolves it, I noticed this approach isn't widely used in the
codebase. Is this the standard solution, or are there better alternatives?
I've tested that the warning persists even when `shrink_one` is simplified to only call `shrink_lruvec`:
```
static void shrink_one(struct lruvec *lruvec, struct scan_control *sc)
{
shrink_lruvec(lruvec, sc);
}
```
How can we properly avoid this warning without using STACK_FRAME_NON_STANDARD?
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH -next 4/5] mm/mglru: combine shrink_many into shrink_node_memcgs
2025-12-09 1:25 [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
` (2 preceding siblings ...)
2025-12-09 1:25 ` [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen Chen Ridong
@ 2025-12-09 1:25 ` Chen Ridong
2025-12-09 1:25 ` [PATCH -next 5/5] mm/mglru: factor lrugen state out of shrink_lruvec Chen Ridong
2025-12-12 10:15 ` [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
5 siblings, 0 replies; 9+ messages in thread
From: Chen Ridong @ 2025-12-09 1:25 UTC (permalink / raw)
To: akpm, axelrasmussen, yuanchu, weixugc, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, hannes,
roman.gushchin, shakeel.butt, muchun.song, zhengqi.arch
Cc: linux-mm, linux-doc, linux-kernel, cgroups, lujialin4, chenridong,
zhongjinji
From: Chen Ridong <chenridong@huawei.com>
The previous patch extended shrink_one to support both lrugen and
non-lrugen reclaim. Now shrink_many and shrink_node_memcgs are almost
identical, except that shrink_many also calls should_abort_scan for lrugen
root reclaim.
This patch adds the should_abort_scan check to shrink_node_memcgs (which is
only meaningful for gen-LRU root reclaim). After this change,
shrink_node_memcgs can be used directly instead of shrink_many, allowing
shrink_many to be safely removed.
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
mm/vmscan.c | 67 ++++++++++++-----------------------------------------
1 file changed, 15 insertions(+), 52 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 795f5ebd9341..dbf2cfbe3243 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4758,57 +4758,6 @@ static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
return nr_to_scan < 0;
}
-static void shrink_one(struct lruvec *lruvec, struct scan_control *sc);
-
-static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
-{
- struct mem_cgroup *target = sc->target_mem_cgroup;
- struct mem_cgroup_reclaim_cookie reclaim = {
- .pgdat = pgdat,
- };
- struct mem_cgroup_reclaim_cookie *cookie = &reclaim;
- struct mem_cgroup *memcg;
-
- if (current_is_kswapd() || sc->memcg_full_walk)
- cookie = NULL;
-
- memcg = mem_cgroup_iter(target, NULL, cookie);
- while (memcg) {
- struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
-
- cond_resched();
-
- mem_cgroup_calculate_protection(target, memcg);
-
- if (mem_cgroup_below_min(target, memcg))
- goto next;
-
- if (mem_cgroup_below_low(target, memcg)) {
- if (!sc->memcg_low_reclaim) {
- sc->memcg_low_skipped = 1;
- goto next;
- }
- memcg_memory_event(memcg, MEMCG_LOW);
- }
-
- shrink_one(lruvec, sc);
-
- if (should_abort_scan(lruvec, sc)) {
- if (cookie)
- mem_cgroup_iter_break(target, memcg);
- break;
- }
-
-next:
- if (cookie && sc->nr_reclaimed >= sc->nr_to_reclaim) {
- mem_cgroup_iter_break(target, memcg);
- break;
- }
-
- memcg = mem_cgroup_iter(target, memcg, cookie);
- }
-}
-
static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
{
struct blk_plug plug;
@@ -4829,6 +4778,9 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
blk_finish_plug(&plug);
}
+static void shrink_one(struct lruvec *lruvec, struct scan_control *sc);
+static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc);
+
static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *sc)
{
struct blk_plug plug;
@@ -4858,7 +4810,7 @@ static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *
if (mem_cgroup_disabled())
shrink_one(&pgdat->__lruvec, sc);
else
- shrink_many(pgdat, sc);
+ shrink_node_memcgs(pgdat, sc);
if (current_is_kswapd())
sc->nr_reclaimed += reclaimed;
@@ -5554,6 +5506,11 @@ static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *
BUILD_BUG();
}
+static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *sc)
+{
+ return false;
+}
+
#endif /* CONFIG_LRU_GEN */
static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
@@ -5822,6 +5779,12 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
shrink_one(lruvec, sc);
+ if (should_abort_scan(lruvec, sc)) {
+ if (partial)
+ mem_cgroup_iter_break(target_memcg, memcg);
+ break;
+ }
+
/* If partial walks are allowed, bail once goal is reached */
if (partial && sc->nr_reclaimed >= sc->nr_to_reclaim) {
mem_cgroup_iter_break(target_memcg, memcg);
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH -next 5/5] mm/mglru: factor lrugen state out of shrink_lruvec
2025-12-09 1:25 [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
` (3 preceding siblings ...)
2025-12-09 1:25 ` [PATCH -next 4/5] mm/mglru: combine shrink_many into shrink_node_memcgs Chen Ridong
@ 2025-12-09 1:25 ` Chen Ridong
2025-12-12 10:15 ` [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
5 siblings, 0 replies; 9+ messages in thread
From: Chen Ridong @ 2025-12-09 1:25 UTC (permalink / raw)
To: akpm, axelrasmussen, yuanchu, weixugc, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, hannes,
roman.gushchin, shakeel.butt, muchun.song, zhengqi.arch
Cc: linux-mm, linux-doc, linux-kernel, cgroups, lujialin4, chenridong,
zhongjinji
From: Chen Ridong <chenridong@huawei.com>
A previous patch updated shrink_node_memcgs to handle lrugen root reclaim
and extended shrink_one to support both lrugen and non-lrugen. However,
in shrink_one, lrugen non-root reclaim still invokes shrink_lruvec, which
should only be used for non-lrugen reclaim.
To clarify the semantics, this patch moves the lrugen-specific logic out of
shrink_lruvec, leaving shrink_lruvec exclusively for non-lrugen reclaim.
Now for lrugen, shrink_one invokes lru_gen_shrink_lruvec, which calls
try_to_shrink_lruvec directly, without extra handling for root reclaim, as
that processing is already done in lru_gen_shrink_node. Non-root reclaim
behavior remains unchanged.
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
mm/vmscan.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dbf2cfbe3243..c5f517ec52a7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4762,7 +4762,12 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
{
struct blk_plug plug;
- VM_WARN_ON_ONCE(root_reclaim(sc));
+ /* Root reclaim has finished other extra work outside, just shrink. */
+ if (root_reclaim(sc)) {
+ try_to_shrink_lruvec(lruvec, sc);
+ return;
+ }
+
VM_WARN_ON_ONCE(!sc->may_writepage || !sc->may_unmap);
lru_add_drain();
@@ -5524,11 +5529,6 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
bool proportional_reclaim;
struct blk_plug plug;
- if (lru_gen_enabled() && !root_reclaim(sc)) {
- lru_gen_shrink_lruvec(lruvec, sc);
- return;
- }
-
get_scan_count(lruvec, sc, nr);
/* Record the original scan target for proportional adjustments later */
@@ -5708,8 +5708,8 @@ static void shrink_one(struct lruvec *lruvec, struct scan_control *sc)
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
- if (lru_gen_enabled() && root_reclaim(sc))
- try_to_shrink_lruvec(lruvec, sc);
+ if (lru_gen_enabled())
+ lru_gen_shrink_lruvec(lruvec, sc);
else
shrink_lruvec(lruvec, sc);
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH -next 0/5] mm/mglru: remove memcg lru
2025-12-09 1:25 [PATCH -next 0/5] mm/mglru: remove memcg lru Chen Ridong
` (4 preceding siblings ...)
2025-12-09 1:25 ` [PATCH -next 5/5] mm/mglru: factor lrugen state out of shrink_lruvec Chen Ridong
@ 2025-12-12 10:15 ` Chen Ridong
5 siblings, 0 replies; 9+ messages in thread
From: Chen Ridong @ 2025-12-12 10:15 UTC (permalink / raw)
To: akpm, axelrasmussen, yuanchu, weixugc, david, lorenzo.stoakes,
Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, hannes,
roman.gushchin, shakeel.butt, muchun.song, zhengqi.arch
Cc: linux-mm, linux-doc, linux-kernel, cgroups, lujialin4, zhongjinji
On 2025/12/9 9:25, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> The memcg LRU was introduced to improve scalability in global reclaim,
> but its implementation has grown complex and can cause performance
> regressions when creating many memory cgroups [1].
>
> This series implements mem_cgroup_iter with a reclaim cookie in
> shrink_many() for global reclaim, following the pattern already used in
> shrink_node_memcgs(), an approach suggested by Johannes [1]. The new
> design maintains good fairness across cgroups by preserving iteration
> state between reclaim passes.
>
> Testing was performed using the original stress test from Yu Zhao [2] on a
> 1 TB, 4-node NUMA system. The results show:
>
> pgsteal:
> memcg LRU memcg iter
> stddev(pgsteal) / mean(pgsteal) 106.03% 93.20%
> sum(pgsteal) / sum(requested) 98.10% 99.28%
>
> workingset_refault_anon:
> memcg LRU memcg iter
> stddev(refault) / mean(refault) 193.97% 134.67%
> sum(refault) 1,963,229 2,027,567
>
> The new implementation shows clear fairness improvements, reducing the
> standard deviation relative to the mean by 12.8 percentage points for
> pgsteal and bringing the pgsteal ratio closer to 100%. Refault counts
> increased by 3.2% (from 1,963,229 to 2,027,567).
>
> To simplify review:
> 1. Patch 1 uses mem_cgroup_iter with reclaim cookie in shrink_many()
> 2. Patch 2 removes the now-unused memcg LRU code
> 3. Patches 3–5 combine shrink_many and shrink_node_memcgs
> (This reorganization is clearer after switching to mem_cgroup_iter)
>
> ---
>
> Changes from RFC series:
> 1. Updated the test result data.
> 2. Added patches 3–5 to combine shrink_many and shrink_node_memcgs.
>
> RFC: https://lore.kernel.org/all/20251204123124.1822965-1-chenridong@huaweicloud.com/
>
> Chen Ridong (5):
> mm/mglru: use mem_cgroup_iter for global reclaim
> mm/mglru: remove memcg lru
> mm/mglru: extend shrink_one for both lrugen and non-lrugen
> mm/mglru: combine shrink_many into shrink_node_memcgs
> mm/mglru: factor lrugen state out of shrink_lruvec
>
> Documentation/mm/multigen_lru.rst | 30 ---
> include/linux/mmzone.h | 89 --------
> mm/memcontrol-v1.c | 6 -
> mm/memcontrol.c | 4 -
> mm/mm_init.c | 1 -
> mm/vmscan.c | 332 ++++--------------------------
> 6 files changed, 44 insertions(+), 418 deletions(-)
>
Hello all,
There's a warning from the kernel test robot, and I would like to update the series to fix it along
with any feedback from your reviews.
I'd appreciate it if you could take a look at this patch series when convenient.
Hi Shakeel, I would be very grateful if you could review patches 3-5. They combine shrink_many and
shrink_node_memcgs as you suggested — does that look good to you?
--
Best regards,
Ridong
^ permalink raw reply [flat|nested] 9+ messages in thread