RE: [PATCH v5 00/14] mm/mglru: improve reclaim loop and dirty folio handling

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

From: wangzicheng <wangzicheng@honor.com>
To: Kairui Song <ryncsn@gmail.com>
Cc: wangxinyu19 <wxy2009nrrr@163.com>,
	"devnull+kasong.tencent.com@kernel.org"
	<devnull+kasong.tencent.com@kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"axelrasmussen@google.com" <axelrasmussen@google.com>,
	"baohua@kernel.org" <baohua@kernel.org>,
	"baolin.wang@linux.alibaba.com" <baolin.wang@linux.alibaba.com>,
	"chenridong@huaweicloud.com" <chenridong@huaweicloud.com>,
	"chrisl@kernel.org" <chrisl@kernel.org>,
	"david@kernel.org" <david@kernel.org>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"kaleshsingh@google.com" <kaleshsingh@google.com>,
	"laoar.shao@gmail.com" <laoar.shao@gmail.com>,
	"lenohou@gmail.com" <lenohou@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"ljs@kernel.org" <ljs@kernel.org>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	"qi.zheng@linux.dev" <qi.zheng@linux.dev>,
	"shakeel.butt@linux.dev" <shakeel.butt@linux.dev>,
	"stevensd@google.com" <stevensd@google.com>,
	"surenb@google.com" <surenb@google.com>,
	"vernon2gm@gmail.com" <vernon2gm@gmail.com>,
	"weixugc@google.com" <weixugc@google.com>,
	"yuanchu@google.com" <yuanchu@google.com>,
	"yuzhao@google.com" <yuzhao@google.com>,
	"zhengqi.arch@bytedance.com" <zhengqi.arch@bytedance.com>,
	wangzhen <wangzhen5@honor.com>, wangtao <tao.wangtao@honor.com>
Subject: RE: [PATCH v5 00/14] mm/mglru: improve reclaim loop and dirty folio handling
Date: Sat, 18 Apr 2026 09:08:19 +0000	[thread overview]
Message-ID: <830980eb128a49c6adc55571b7015fab@honor.com> (raw)
In-Reply-To: <CAMgjq7BsSMxia=nYA1n7d+sOq8RhRb-F6F7E6tt+R6buf7WZOQ@mail.gmail.com>

> On Sat, Apr 18, 2026 at 3:38 PM wangzicheng <wangzicheng@honor.com>
> wrote:
> >
> > > Hi Kairui,
> > >
> > > We have tested this patch series on Android device under a typical
> scenario.
> > >
> > > The test consisted of cold-starting multiple applications sequentially
> > > under moderate system load (some services running on the background,
> > > such as map navigating, AI voice-assistant). Each test round cold-starts
> > > a fixed set of apps one by one and records the cold start latency.
> > > A total of 100 rounds were conducted to ensure statistical significance.
> > >
> >
> > Hi Xinyu and Kairui,
> >
> > We have test the patch under a **heavy** load benchmark for camera.
> >
> > > Before:
> > >   /proc/vmstat info:
> > >     pgpgin 269,224
> > >     pgpgout 226,078
> > >     workingset_refault_anon 237
> > >     workingset_refault_file 27689
> > >
> > >   Launch Time Summary (all apps, all runs)
> > >     Mean 868.0ms
> > >     P50 888.0ms
> > >     P90 1274.2ms
> > >     P95 1399.0ms
> > >
> > > After:
> > >   /proc/vmstat info:
> > >     pgpgin 223,801                (-16.9%)
> > >     pgpgout 308,873
> > >     workingset_refault_anon 498
> > >     workingset_refault_file 17075 (-38.3%)
> > >
> > >   Launch Time Summary (all apps, all runs)
> > >     Mean 850.5ms (-2.07%)
> > >     P50 861.5ms  (-3.04%)
> > >     P90 1179.0ms (-8.05%)
> > >     P95 1228.0ms (-12.2%)
> > >
> > > --
> > > Best regards,
> > > Xinyu
> > >
> >
> > We evaluated the backported patches on android16-6.12 using a
> **heavy**
> 
> Hi Zicheng
> 
> I'm not sure how you did that, this series applies on mm-unstable and
> there is a large gap between that and 6.12.
> 
> > mobile workload on a Qualcomm 8850 device (16GB RAM + 16GB zram).
> > (vmscan code in this tree is largely similar to v6.18)
> >
Thanks for pointing that out.

There is indeed a relatively large gap between mm-unstable and our
android16-6.12 tree. The series was backported manually and we only
applied the changes required to make it build and run in our tree.

Because of this, it is possible that some related changes from
mm-unstable were not included, which may have affected the behavior or
performance we observed. If this caused misleading results, we
apologize for the confusion.

Regarding vendor hooks, in our tree there is only one hook in
get_nr_to_scan(). We tested with that hook disabled.

The performance data was collected using Perfetto traces.
Unfortunately those traces contain a large amount of runtime
information and are not easy to share externally.

If needed, we can also try to reproduce the test on a tree closer to
mm-unstable once our chipset platform kernel tree gets updated to
a newer version, to see whether the behavior still reproduces.

Below is the patch we manually applied during the backport.




diff --git a/mm/vmscan.c b/mm/vmscan.c
index f78cfe059f14..50109cd5e94c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1987,6 +1987,44 @@ static int current_may_throttle(void)
 	return !(current->flags & PF_LOCAL_THROTTLE);
 }
 
+static void handle_reclaim_writeback(unsigned long nr_taken,
+				     struct pglist_data *pgdat,
+				     struct scan_control *sc,
+				     struct reclaim_stat *stat)
+{
+	/*
+	 * If dirty folios are scanned that are not queued for IO, it
+	 * implies that flushers are not doing their job. This can
+	 * happen when memory pressure pushes dirty folios to the end of
+	 * the LRU before the dirty limits are breached and the dirty
+	 * data has expired. It can also happen when the proportion of
+	 * dirty folios grows not through writes but through memory
+	 * pressure reclaiming all the clean cache. And in some cases,
+	 * the flushers simply cannot keep up with the allocation
+	 * rate. Nudge the flusher threads in case they are asleep.
+	 */
+	if (stat->nr_unqueued_dirty == nr_taken && nr_taken) {
+		wakeup_flusher_threads(WB_REASON_VMSCAN);
+		/*
+		 * For cgroupv1 dirty throttling is achieved by waking up
+		 * the kernel flusher here and later waiting on folios
+		 * which are in writeback to finish (see shrink_folio_list()).
+		 *
+		 * Flusher may not be able to issue writeback quickly
+		 * enough for cgroupv1 writeback throttling to work
+		 * on a large system.
+		 */
+		if (!writeback_throttling_sane(sc))
+			reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
+	}
+
+	sc->nr.dirty += stat->nr_dirty;
+	sc->nr.congested += stat->nr_congested;
+	sc->nr.writeback += stat->nr_writeback;
+	sc->nr.immediate += stat->nr_immediate;
+	sc->nr.taken += nr_taken;
+}
+
 /*
  * shrink_inactive_list() is a helper for shrink_node().  It returns the number
  * of reclaimed pages
@@ -2054,41 +2092,15 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
 
 	lru_note_cost(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed);
 
-	/*
-	 * If dirty folios are scanned that are not queued for IO, it
-	 * implies that flushers are not doing their job. This can
-	 * happen when memory pressure pushes dirty folios to the end of
-	 * the LRU before the dirty limits are breached and the dirty
-	 * data has expired. It can also happen when the proportion of
-	 * dirty folios grows not through writes but through memory
-	 * pressure reclaiming all the clean cache. And in some cases,
-	 * the flushers simply cannot keep up with the allocation
-	 * rate. Nudge the flusher threads in case they are asleep.
-	 */
-	if (stat.nr_unqueued_dirty == nr_taken) {
-		wakeup_flusher_threads(WB_REASON_VMSCAN);
-		/*
-		 * For cgroupv1 dirty throttling is achieved by waking up
-		 * the kernel flusher here and later waiting on folios
-		 * which are in writeback to finish (see shrink_folio_list()).
-		 *
-		 * Flusher may not be able to issue writeback quickly
-		 * enough for cgroupv1 writeback throttling to work
-		 * on a large system.
-		 */
-		if (!writeback_throttling_sane(sc))
-			reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
-	}
+	
+	// sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
+	// leave nr_unqueued_dirty in scan_control to keep integrity
 
-	sc->nr.dirty += stat.nr_dirty;
-	sc->nr.congested += stat.nr_congested;
-	sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
-	sc->nr.writeback += stat.nr_writeback;
-	sc->nr.immediate += stat.nr_immediate;
-	sc->nr.taken += nr_taken;
-	if (file)
-		sc->nr.file_taken += nr_taken;
+	// if (file)
+	// 	sc->nr.file_taken += nr_taken;
+	// leave nr_taken in scan_control to keep integrity
 
+	handle_reclaim_writeback(nr_taken, pgdat, sc, &stat);
 	trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
 			nr_scanned, nr_reclaimed, &stat, sc->priority, file);
 	return nr_reclaimed;
@@ -3291,7 +3303,7 @@ static int folio_update_gen(struct folio *folio, int gen)
 }
 
 /* protect pages accessed multiple times through file descriptors */
-static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
+static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio)
 {
 	int type = folio_is_file_lru(folio);
 	struct lru_gen_folio *lrugen = &lruvec->lrugen;
@@ -3310,9 +3322,6 @@ static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclai
 
 		new_flags = old_flags & ~(LRU_GEN_MASK | LRU_REFS_FLAGS);
 		new_flags |= (new_gen + 1UL) << LRU_GEN_PGOFF;
-		/* for folio_end_writeback() */
-		if (reclaiming)
-			new_flags |= BIT(PG_reclaim);
 	} while (!try_cmpxchg(&folio->flags, &old_flags, new_flags));
 
 	lru_gen_update_size(lruvec, folio, old_gen, new_gen);
@@ -3918,7 +3927,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
 			VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) != type, folio);
 			VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) != zone, folio);
 
-			new_gen = folio_inc_gen(lruvec, folio, false);
+			new_gen = folio_inc_gen(lruvec, folio);
 			list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]);
 
 			/* don't count the workingset being lazily promoted */
@@ -3941,10 +3950,10 @@ static bool inc_min_seq(struct lruvec *lruvec, int type, int swappiness)
 	return true;
 }
 
-static bool try_to_inc_min_seq(struct lruvec *lruvec, int swappiness)
+static void try_to_inc_min_seq(struct lruvec *lruvec, int swappiness)
 {
 	int gen, type, zone;
-	bool success = false;
+	bool seq_inc_flag = false;
 	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	DEFINE_MIN_SEQ(lruvec);
 
@@ -3961,11 +3970,19 @@ static bool try_to_inc_min_seq(struct lruvec *lruvec, int swappiness)
 			}
 
 			min_seq[type]++;
+			seq_inc_flag = true;
 		}
 next:
 		;
 	}
 
+	/*
+	 * If min_seq[type] of both anonymous and file is not increased,
+	 * return here to avoid unnecessary checking overhead later.
+ 	 */
+	if (!seq_inc_flag)
+		return;
+
 	/* see the comment on lru_gen_folio */
 	if (swappiness && swappiness <= MAX_SWAPPINESS) {
 		unsigned long seq = lrugen->max_seq - MIN_NR_GENS;
@@ -3982,10 +3999,8 @@ static bool try_to_inc_min_seq(struct lruvec *lruvec, int swappiness)
 
 		reset_ctrl_pos(lruvec, type, true);
 		WRITE_ONCE(lrugen->min_seq[type], min_seq[type]);
-		success = true;
 	}
 
-	return success;
 }
 
 static bool inc_max_seq(struct lruvec *lruvec, unsigned long seq, int swappiness)
@@ -4137,27 +4152,33 @@ static void set_initial_priority(struct pglist_data *pgdat, struct scan_control
 	sc->priority = clamp(priority, DEF_PRIORITY / 2, DEF_PRIORITY);
 }
 
-static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
+static unsigned long lruvec_evictable_size(struct lruvec *lruvec, int swappiness)
 {
 	int gen, type, zone;
-	unsigned long total = 0;
-	int swappiness = get_swappiness(lruvec, sc);
+	unsigned long seq, total = 0;
 	struct lru_gen_folio *lrugen = &lruvec->lrugen;
-	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 	DEFINE_MAX_SEQ(lruvec);
 	DEFINE_MIN_SEQ(lruvec);
 
 	for_each_evictable_type(type, swappiness) {
-		unsigned long seq;
-
 		for (seq = min_seq[type]; seq <= max_seq; seq++) {
 			gen = lru_gen_from_seq(seq);
-
 			for (zone = 0; zone < MAX_NR_ZONES; zone++)
 				total += max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
 		}
 	}
 
+	return total;
+}
+
+static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
+{
+	unsigned long total;
+	int swappiness = get_swappiness(lruvec, sc);
+	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
+
+	total = lruvec_evictable_size(lruvec, swappiness);
+
 	/* whether the size is big enough to be helpful */
 	return mem_cgroup_online(memcg) ? (total >> sc->priority) : total;
 }
@@ -4475,7 +4496,6 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c
 		       int tier_idx)
 {
 	bool success;
-	bool dirty, writeback;
 	int gen = folio_lru_gen(folio);
 	int type = folio_is_file_lru(folio);
 	int zone = folio_zonenum(folio);
@@ -4505,7 +4525,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c
 
 	/* protected */
 	if (tier > tier_idx || refs + workingset == BIT(LRU_REFS_WIDTH) + 1) {
-		gen = folio_inc_gen(lruvec, folio, false);
+		gen = folio_inc_gen(lruvec, folio);
 		list_move(&folio->lru, &lrugen->folios[gen][type][zone]);
 
 		/* don't count the workingset being lazily promoted */
@@ -4520,26 +4540,11 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c
 
 	/* ineligible */
 	if (!folio_test_lru(folio) || zone > sc->reclaim_idx) {
-		gen = folio_inc_gen(lruvec, folio, false);
+		gen = folio_inc_gen(lruvec, folio);
 		list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
 		return true;
 	}
 
-	dirty = folio_test_dirty(folio);
-	writeback = folio_test_writeback(folio);
-	if (type == LRU_GEN_FILE && dirty) {
-		sc->nr.file_taken += delta;
-		if (!writeback)
-			sc->nr.unqueued_dirty += delta;
-	}
-
-	/* waiting for writeback */
-	if (writeback || (type == LRU_GEN_FILE && dirty)) {
-		gen = folio_inc_gen(lruvec, folio, true);
-		list_move(&folio->lru, &lrugen->folios[gen][type][zone]);
-		return true;
-	}
-
 	return false;
 }
 
@@ -4547,12 +4552,6 @@ bool isolate_folio(struct lruvec *lruvec, struct folio *folio, struct scan_contr
 {
 	bool success;
 
-	/* swap constrained */
-	if (!(sc->gfp_mask & __GFP_IO) &&
-	    (folio_test_dirty(folio) ||
-	     (folio_test_anon(folio) && !folio_test_swapcache(folio))))
-		return false;
-
 	/* raced with release_pages() */
 	if (!folio_try_get(folio))
 		return false;
@@ -4567,8 +4566,6 @@ bool isolate_folio(struct lruvec *lruvec, struct folio *folio, struct scan_contr
 	if (!folio_test_referenced(folio))
 		set_mask_bits(&folio->flags, LRU_REFS_MASK, 0);
 
-	/* for shrink_folio_list() */
-	folio_clear_reclaim(folio);
 
 	success = lru_gen_del_folio(lruvec, folio, true);
 	VM_WARN_ON_ONCE_FOLIO(!success, folio);
@@ -4577,8 +4574,9 @@ bool isolate_folio(struct lruvec *lruvec, struct folio *folio, struct scan_contr
 }
 EXPORT_SYMBOL_GPL(isolate_folio);
 
-static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
-		       int type, int tier, struct list_head *list)
+static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec, struct scan_control *sc,
+		       int type, int tier, 
+			   struct list_head *list, int *isolatedp)
 {
 	int i;
 	int gen;
@@ -4587,10 +4585,11 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
 	int scanned = 0;
 	int isolated = 0;
 	int skipped = 0;
-	int remaining = MAX_LRU_BATCH;
+	unsigned long remaining = nr_to_scan;
 	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 
+	VM_WARN_ON_ONCE(nr_to_scan > MAX_LRU_BATCH);
 	VM_WARN_ON_ONCE(!list_empty(list));
 
 	if (get_nr_gens(lruvec, type) == MIN_NR_GENS)
@@ -4647,16 +4646,12 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
 	__count_memcg_events(memcg, item, isolated);
 	__count_memcg_events(memcg, PGREFILL, sorted);
 	__count_vm_events(PGSCAN_ANON + type, isolated);
-	trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, MAX_LRU_BATCH,
+	trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan,
 				scanned, skipped, isolated,
 				type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
-	if (type == LRU_GEN_FILE)
-		sc->nr.file_taken += isolated;
-	/*
-	 * There might not be eligible folios due to reclaim_idx. Check the
-	 * remaining to prevent livelock if it's not making progress.
-	 */
-	return isolated || !remaining ? scanned : 0;
+
+	*isolatedp = isolated;
+	return scanned;
 }
 
 static int get_tier_idx(struct lruvec *lruvec, int type)
@@ -4698,33 +4693,36 @@ static int get_type_to_scan(struct lruvec *lruvec, int swappiness)
 	return positive_ctrl_err(&sp, &pv);
 }
 
-static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness,
-			  int *type_scanned, struct list_head *list)
+static int isolate_folios(unsigned long nr_to_scan, struct lruvec *lruvec, struct scan_control *sc, int swappiness,
+			   struct list_head *list, int *isolated,
+			  int *isolate_type, int *isolate_scanned)
 {
 	int i;
+	int scanned = 0;
 	int type = get_type_to_scan(lruvec, swappiness);
 
 	for_each_evictable_type(i, swappiness) {
-		int scanned;
+		int type_scan;
 		int tier = get_tier_idx(lruvec, type);
 
-		*type_scanned = type;
+		type_scan = scan_folios(nr_to_scan, lruvec, sc,
+					type, tier, list, isolated);
 
-		scanned = scan_folios(lruvec, sc, type, tier, list);
-		if (scanned)
-			return scanned;
+		scanned += type_scan;
+		if (*isolated) {
+			*isolate_type = type;
+			*isolate_scanned = type_scan;
+			break;
+		}
 
 		type = !type;
 	}
 
-	return 0;
+	return scanned;
 }
 
-static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swappiness)
+static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec, struct scan_control *sc, int swappiness)
 {
-	int type;
-	int scanned;
-	int reclaimed;
 	LIST_HEAD(list);
 	LIST_HEAD(clean);
 	struct folio *folio;
@@ -4732,19 +4730,23 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 	enum vm_event_item item;
 	struct reclaim_stat stat;
 	struct lru_gen_mm_walk *walk;
+	int scanned, reclaimed;
+	int isolated = 0, type, type_scanned;
 	bool skip_retry = false;
-	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 
 	spin_lock_irq(&lruvec->lru_lock);
 
-	scanned = isolate_folios(lruvec, sc, swappiness, &type, &list);
+	/* In case folio deletion left empty old gens, flush them */
+	try_to_inc_min_seq(lruvec, swappiness);
 
-	scanned += try_to_inc_min_seq(lruvec, swappiness);
+	scanned = isolate_folios(nr_to_scan, lruvec, sc, swappiness,
+				 &list, &isolated, &type, &type_scanned);
 
-	if (evictable_min_seq(lrugen->min_seq, swappiness) + MIN_NR_GENS > lrugen->max_seq)
-		scanned = 0;
+	/* Isolation might create empty gen, flush them */
+	if (scanned)
+		try_to_inc_min_seq(lruvec, swappiness);
 
 	spin_unlock_irq(&lruvec->lru_lock);
 
@@ -4752,10 +4754,10 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 		return scanned;
 retry:
 	reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false);
-	sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
 	sc->nr_reclaimed += reclaimed;
+	handle_reclaim_writeback(isolated, pgdat, sc, &stat);
 	trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
-			scanned, reclaimed, &stat, sc->priority,
+			type_scanned, reclaimed, &stat, sc->priority,
 			type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
 
 	list_for_each_entry_safe_reverse(folio, next, &list, lru) {
@@ -4804,6 +4806,7 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 
 	if (!list_empty(&list)) {
 		skip_retry = true;
+		isolated = 0;
 		goto retry;
 	}
 
@@ -4813,28 +4816,14 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
 			     int swappiness, unsigned long *nr_to_scan)
 {
-	int gen, type, zone;
-	unsigned long size = 0;
-	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	DEFINE_MIN_SEQ(lruvec);
 
-	*nr_to_scan = 0;
 	/* have to run aging, since eviction is not possible anymore */
 	if (evictable_min_seq(min_seq, swappiness) + MIN_NR_GENS > max_seq)
 		return true;
 
-	for_each_evictable_type(type, swappiness) {
-		unsigned long seq;
-
-		for (seq = min_seq[type]; seq <= max_seq; seq++) {
-			gen = lru_gen_from_seq(seq);
+	*nr_to_scan = lruvec_evictable_size(lruvec, swappiness);
 
-			for (zone = 0; zone < MAX_NR_ZONES; zone++)
-				size += max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
-		}
-	}
-
-	*nr_to_scan = size;
 	/* better to run aging even though eviction is still possible */
 	return evictable_min_seq(min_seq, swappiness) + MIN_NR_GENS == max_seq;
 }
@@ -4844,27 +4833,55 @@ static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
  * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
  *    reclaim.
  */
-static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, int swappiness)
-{
-	bool success;
-	unsigned long nr_to_scan;
-	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
-	DEFINE_MAX_SEQ(lruvec);
+// static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc,
+// 			   struct mem_cgroup *memcg, int swappiness)
+// {
+// 	unsigned long nr_to_scan, evictable;
+// 	bool bypass = false;
+// 	bool young = false;
+// 	DEFINE_MAX_SEQ(lruvec);
+
+// 	evictable = lruvec_evictable_size(lruvec, swappiness);
+// 	nr_to_scan = evictable;
+
+// 	/* try to scrape all its memory if this memcg was deleted */
+// 	if (!mem_cgroup_online(memcg))
+// 		return nr_to_scan;
+
+// 	// nr_to_scan = apply_proportional_protection(memcg, sc, nr_to_scan);
+// 	// not exist in the android code
+// 	nr_to_scan >>= sc->priority;
+
+// 	if (!nr_to_scan && sc->priority < DEF_PRIORITY)
+// 		nr_to_scan = min(evictable, SWAP_CLUSTER_MAX);
+
+// 	trace_android_vh_mglru_aging_bypass(lruvec, max_seq,
+// 		swappiness, &bypass, &young);
+// 	if (bypass)
+// 		return young ? -1 : 0;
+
+// 	return nr_to_scan;
+// }
+static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc,
+			   struct mem_cgroup *memcg, int swappiness)
+{
+	unsigned long nr_to_scan, evictable;
 	bool bypass = false;
 	bool young = false;
+	DEFINE_MAX_SEQ(lruvec);
 
-	if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg))
-		return -1;
-
-	success = should_run_aging(lruvec, max_seq, swappiness, &nr_to_scan);
+	evictable = lruvec_evictable_size(lruvec, swappiness);
+	nr_to_scan = evictable;
 
 	/* try to scrape all its memory if this memcg was deleted */
-	if (nr_to_scan && !mem_cgroup_online(memcg))
+	if (!mem_cgroup_online(memcg))
 		return nr_to_scan;
 
+	nr_to_scan >>= sc->priority;
+
 	/* try to get away with not aging at the default priority */
-	if (!success || sc->priority == DEF_PRIORITY)
-		return nr_to_scan >> sc->priority;
+	if (!nr_to_scan && sc->priority < DEF_PRIORITY)
+		nr_to_scan = min(evictable, SWAP_CLUSTER_MAX);
 
 	trace_android_vh_mglru_aging_bypass(lruvec, max_seq,
 		swappiness, &bypass, &young);
@@ -4872,7 +4889,7 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, int s
 		return young ? -1 : 0;
 
 	/* stop scanning this lruvec as it's low on cold folios */
-	return try_to_inc_max_seq(lruvec, max_seq, swappiness, false) ? -1 : 0;
+	return nr_to_scan;
 }
 
 static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *sc)
@@ -4909,47 +4926,58 @@ static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *sc)
 	return true;
 }
 
+/*
+ * For future optimizations:
+ * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
+ *    reclaim.
+ */
 static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
 {
-	long nr_to_scan;
-	unsigned long scanned = 0;
+	bool need_rotate = false, should_age = false;
+	long nr_batch, nr_to_scan;
 	int swappiness = get_swappiness(lruvec, sc);
+	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 
-	while (true) {
+	nr_to_scan = get_nr_to_scan(lruvec, sc, memcg, swappiness);
+	if (!nr_to_scan)
+		need_rotate = true;
+
+	while (nr_to_scan > 0) {
 		int delta;
+		DEFINE_MAX_SEQ(lruvec);
 
-		nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness);
-		if (nr_to_scan <= 0)
+		if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg)) {
+			need_rotate = true;
 			break;
+		}
 
-		delta = evict_folios(lruvec, sc, swappiness);
+		if (should_run_aging(lruvec, max_seq, swappiness, &nr_to_scan)) {
+			if (try_to_inc_max_seq(lruvec, max_seq, swappiness, false))
+				need_rotate = true;
+			should_age = true;
+		}
+
+		nr_batch = min(nr_to_scan, MIN_LRU_BATCH);
+		delta = evict_folios(nr_batch, lruvec, sc, swappiness);
 		if (!delta)
 			break;
 
-		scanned += delta;
-		if (scanned >= nr_to_scan)
+		if (should_abort_scan(lruvec, sc))
 			break;
 
-		if (should_abort_scan(lruvec, sc))
+		/* For cgroup reclaim, fairness is handled by iterator, not rotation */
+		if (root_reclaim(sc) && should_age)
 			break;
 
 		cond_resched();
 	}
 
-	/*
-	 * If too many file cache in the coldest generation can't be evicted
-	 * due to being dirty, wake up the flusher.
-	 */
-	if (sc->nr.unqueued_dirty && sc->nr.unqueued_dirty == sc->nr.file_taken)
-		wakeup_flusher_threads(WB_REASON_VMSCAN);
-
-	/* whether this lruvec should be rotated */
-	return nr_to_scan < 0;
+	return need_rotate;
 }
 
 static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
 {
-	bool success;
+	bool need_rotate;
 	unsigned long scanned = sc->nr_scanned;
 	unsigned long reclaimed = sc->nr_reclaimed;
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
@@ -4967,7 +4995,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
 		memcg_memory_event(memcg, MEMCG_LOW);
 	}
 
-	success = try_to_shrink_lruvec(lruvec, sc);
+	need_rotate = try_to_shrink_lruvec(lruvec, sc);
 
 	shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority);
 
@@ -4977,10 +5005,10 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
 
 	flush_reclaim_state(sc);
 
-	if (success && mem_cgroup_online(memcg))
+	if (need_rotate && mem_cgroup_online(memcg))
 		return MEMCG_LRU_YOUNG;
 
-	if (!success && lruvec_is_sizable(lruvec, sc))
+	if (!need_rotate && lruvec_is_sizable(lruvec, sc))
 		return 0;
 
 	/* one retry if offlined or too small */
@@ -5532,6 +5560,7 @@ static int run_aging(struct lruvec *lruvec, unsigned long seq,
 static int run_eviction(struct lruvec *lruvec, unsigned long seq, struct scan_control *sc,
 			int swappiness, unsigned long nr_to_reclaim)
 {
+	int nr_batch;
 	DEFINE_MAX_SEQ(lruvec);
 
 	if (seq + MIN_NR_GENS > max_seq)
@@ -5548,7 +5577,8 @@ static int run_eviction(struct lruvec *lruvec, unsigned long seq, struct scan_co
 		if (sc->nr_reclaimed >= nr_to_reclaim)
 			return 0;
 
-		if (!evict_folios(lruvec, sc, swappiness))
+		nr_batch = min(nr_to_reclaim - sc->nr_reclaimed, MAX_LRU_BATCH);
+		if (!evict_folios(nr_batch, lruvec, sc, swappiness))
 			return 0;
 
 		cond_resched();

next prev parent reply	other threads:[~2026-04-18  9:08 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-12 16:48 [PATCH v5 00/14] mm/mglru: improve reclaim loop and dirty folio handling Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 01/14] mm/mglru: consolidate common code for retrieving evictable size Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 02/14] mm/mglru: rename variables related to aging and rotation Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 03/14] mm/mglru: relocate the LRU scan batch limit to callers Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 04/14] mm/mglru: restructure the reclaim loop Kairui Song via B4 Relay
2026-04-16  6:33   ` Barry Song
2026-04-16 18:47   ` Kairui Song
2026-04-12 16:48 ` [PATCH v5 05/14] mm/mglru: scan and count the exact number of folios Kairui Song via B4 Relay
2026-04-15  3:16   ` Baolin Wang
2026-04-16  7:01   ` Barry Song
2026-04-16 17:39     ` Kairui Song
2026-04-12 16:48 ` [PATCH v5 06/14] mm/mglru: use a smaller batch for reclaim Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 07/14] mm/mglru: don't abort scan immediately right after aging Kairui Song via B4 Relay
2026-04-16  7:32   ` Barry Song
2026-04-12 16:48 ` [PATCH v5 08/14] mm/mglru: remove redundant swap constrained check upon isolation Kairui Song via B4 Relay
2026-04-14  7:43   ` Chen Ridong
2026-04-15  3:19   ` Baolin Wang
2026-04-16  9:05   ` Barry Song
2026-04-12 16:48 ` [PATCH v5 09/14] mm/mglru: use the common routine for dirty/writeback reactivation Kairui Song via B4 Relay
2026-04-15  3:30   ` Baolin Wang
2026-04-16  9:18   ` Barry Song
2026-04-12 16:48 ` [PATCH v5 10/14] mm/mglru: simplify and improve dirty writeback handling Kairui Song via B4 Relay
2026-04-15  3:25   ` Baolin Wang
2026-04-12 16:48 ` [PATCH v5 11/14] mm/mglru: remove no longer used reclaim argument for folio protection Kairui Song via B4 Relay
2026-04-12 16:48 ` [PATCH v5 12/14] mm/vmscan: remove sc->file_taken Kairui Song via B4 Relay
2026-04-14  7:46   ` Chen Ridong
2026-04-12 16:48 ` [PATCH v5 13/14] mm/vmscan: remove sc->unqueued_dirty Kairui Song via B4 Relay
2026-04-14  7:46   ` Chen Ridong
2026-04-12 16:48 ` [PATCH v5 14/14] mm/vmscan: unify writeback reclaim statistic and throttling Kairui Song via B4 Relay
2026-04-17  2:51 ` [PATCH v5 00/14] mm/mglru: improve reclaim loop and dirty folio handling wangxinyu19
2026-04-17 17:52   ` Kairui Song
2026-04-18  7:17   ` wangzicheng
2026-04-18  8:16     ` Kairui Song
2026-04-18  9:08       ` wangzicheng [this message]
2026-04-18 11:50         ` Kairui Song
2026-04-18  8:55     ` Barry Song
2026-04-17  2:55 ` wangxinyu19

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:f78cfe059f1 dfblob:50109cd5e94 )
 OR (
bs:"RE: [PATCH v5 00/14] mm/mglru: improve reclaim loop and dirty folio handling" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=830980eb128a49c6adc55571b7015fab@honor.com \
    --to=wangzicheng@honor.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=chenridong@huaweicloud.com \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=devnull+kasong.tencent.com@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kaleshsingh@google.com \
    --cc=laoar.shao@gmail.com \
    --cc=lenohou@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=qi.zheng@linux.dev \
    --cc=ryncsn@gmail.com \
    --cc=shakeel.butt@linux.dev \
    --cc=stevensd@google.com \
    --cc=surenb@google.com \
    --cc=tao.wangtao@honor.com \
    --cc=vernon2gm@gmail.com \
    --cc=wangzhen5@honor.com \
    --cc=weixugc@google.com \
    --cc=wxy2009nrrr@163.com \
    --cc=yuanchu@google.com \
    --cc=yuzhao@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox