From: Kairui Song <ryncsn@gmail.com>
To: linux-mm@kvack.org
Cc: Kairui Song <kasong@tencent.com>,
Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
David Hildenbrand <david@kernel.org>,
Michal Hocko <mhocko@kernel.org>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Shakeel Butt <shakeel.butt@linux.dev>,
Lorenzo Stoakes <ljs@kernel.org>, Barry Song <baohua@kernel.org>,
David Stevens <stevensd@google.com>,
Chen Ridong <chenridong@huaweicloud.com>,
Leno Hou <lenohou@gmail.com>, Yafang Shao <laoar.shao@gmail.com>,
Yu Zhao <yuzhao@google.com>,
Zicheng Wang <wangzicheng@honor.com>,
Kalesh Singh <kaleshsingh@google.com>,
Suren Baghdasaryan <surenb@google.com>,
Chris Li <chrisl@kernel.org>, Vernon Yang <vernon2gm@gmail.com>,
linux-kernel@vger.kernel.org, Qi Zheng <qi.zheng@linux.dev>,
Baolin Wang <baolin.wang@linux.alibaba.com>
Subject: Re: [PATCH v6 04/14] mm/mglru: restructure the reclaim loop
Date: Sat, 25 Apr 2026 01:04:01 +0800 [thread overview]
Message-ID: <aet1hd9DfRH4aSOO@KASONG-MC4> (raw)
In-Reply-To: <20260424-mglru-reclaim-v6-4-a57622d770c3@tencent.com>
Forwarding sashiko review:
> commit b0a64f8c3f76ce5cebb30c1caa6305fc617888cf
> Author: Kairui Song <kasong@tencent.com>
> mm/mglru: restructure the reclaim loop
> The current loop calculates the scan number on each iteration, which couples
> it with aging and rotation. This patch decouples aging and rotation by
> calculating the scan number once at the beginning of reclaim, respecting
> reclaim priority, and making aging explicit.
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 3c5a6ae9244099..757beb605980d4 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> [ ... ]
> > @@ -4940,11 +4932,10 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc,
> > return nr_to_scan;
> >
> > nr_to_scan = apply_proportional_protection(memcg, sc, nr_to_scan);
> > + nr_to_scan >>= sc->priority;
> >
> > - /* try to get away with not aging at the default priority */
> > - if (!need_aging || sc->priority == DEF_PRIORITY)
> > - return nr_to_scan >> sc->priority;
> > + if (!nr_to_scan && sc->priority < DEF_PRIORITY)
> > + nr_to_scan = min(evictable, SWAP_CLUSTER_MAX);
> Does this code introduce a priority inversion where higher reclaim priorities
> result in fewer pages being scanned than lower priorities?
> If a small memcg has an evictable size of 100 and SWAP_CLUSTER_MAX is 32,
> at sc->priority == 7, 100 >> 7 is 0, so the fallback sets nr_to_scan to 32.
> At sc->priority == 6 (higher urgency), 100 >> 6 is 1. The fallback is
> bypassed and nr_to_scan remains 1.
> Would it be better for this fallback to apply a floor regardless of whether
> the shifted result is exactly zero?
Right, sashiko has been very concerned about this part for many times,
it wasn't happy about either design (unconditional floor, no handling
at all, or this version). But this is not a functional regression.
For tiny cgroups the old code before this patch could be over reclaiming.
Now the over reclaiming is no more but could be a trivial priority
inversion, which I don't think that's a practically observable problem.
But to be academically ideal, following change should better:
nr_to_scan = apply_proportional_protection(memcg, sc, nr_to_scan);
nr_to_scan >>= sc->priority;
/* make escalated reclaim progress and balance with slab shrinking */
if (sc->priority < DEF_PRIORITY)
nr_to_scan = max(nr_to_scan, min(evictable, SWAP_CLUSTER_MAX));
No inversion, and no under reclaim or over reclaim.
> [ ... ]
> > @@ -4985,31 +4977,44 @@ static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *sc)
> > return true;
> > }
> >
> > +/*
> > + * For future optimizations:
> > + * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
> > + * reclaim.
> > + */
> > static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
> > {
> > + bool need_rotate = false;
> > long nr_batch, nr_to_scan;
> > - unsigned long scanned = 0;
> > int swappiness = get_swappiness(lruvec, sc);
> > + struct mem_cgroup *memcg = lruvec_memcg(lruvec);
> >
> > - while (true) {
> > + nr_to_scan = get_nr_to_scan(lruvec, sc, memcg, swappiness);
> > + while (nr_to_scan > 0) {
> > int delta;
> > + DEFINE_MAX_SEQ(lruvec);
> >
> > - nr_to_scan = get_nr_to_scan(lruvec, sc, swappiness);
> > - if (nr_to_scan <= 0)
> > + if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg)) {
> > + need_rotate = true;
> > break;
> > + }
> > +
> > + if (should_run_aging(lruvec, max_seq, sc, swappiness)) {
> > + if (try_to_inc_max_seq(lruvec, max_seq, swappiness, false))
> > + need_rotate = true;
> > + /* stop scanning as it's low on cold folios */
> > + break;
> > + }
> Can this bypass critical memcg protection checks and aging when nr_to_scan
> is 0?
> If get_nr_to_scan() returns 0 (which happens if evictable == 0 or when
> nr_to_scan is shifted to 0 at DEF_PRIORITY), the loop body is bypassed
> entirely.
> This means mem_cgroup_below_min() is never evaluated in those cases,
> preventing need_rotate from being set to true for protected memcgs.
> The reclaimer relies on this flag to rotate away from protected memcgs.
> Additionally, does this skip aging for empty offline memcgs
> (where evictable == 0), since they never enter the loop where
> should_run_aging() is called?
> The commit message states that offline memcgs should now be aged
> to enable further reclaim, but this bypass seems to prevent that.
Pretty much a false positive. Bypassing the loop body is not a problem
at DEF_PRIORITY, as there wasn't aging before either, and the rotation
for below-min memcgs is already handled by shrink_one before entering
the loop in the global reclaim path, and is inconsequential in the
memcg reclaim path.
And for offline memcg, nr_to_scan is never zero unless that memcg is
completely empty. If we got a lot of empty memcg slowing down the
memcg iterator due to missing rotation, that's a different issue
and not related or introduce by this patch.
next prev parent reply other threads:[~2026-04-24 17:04 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-23 17:43 [PATCH v6 00/14] mm/mglru: improve reclaim loop and dirty folio handling Kairui Song via B4 Relay
2026-04-23 17:43 ` [PATCH v6 01/14] mm/mglru: consolidate common code for retrieving evictable size Kairui Song via B4 Relay
2026-04-23 17:43 ` [PATCH v6 02/14] mm/mglru: rename variables related to aging and rotation Kairui Song via B4 Relay
2026-04-23 17:43 ` [PATCH v6 03/14] mm/mglru: relocate the LRU scan batch limit to callers Kairui Song via B4 Relay
2026-04-23 17:43 ` [PATCH v6 04/14] mm/mglru: restructure the reclaim loop Kairui Song via B4 Relay
2026-04-24 17:04 ` Kairui Song [this message]
2026-04-23 17:43 ` [PATCH v6 05/14] mm/mglru: scan and count the exact number of folios Kairui Song via B4 Relay
2026-04-23 17:43 ` [PATCH v6 06/14] mm/mglru: use a smaller batch for reclaim Kairui Song via B4 Relay
2026-04-23 17:43 ` [PATCH v6 07/14] mm/mglru: don't abort scan immediately right after aging Kairui Song via B4 Relay
2026-04-23 17:43 ` [PATCH v6 08/14] mm/mglru: remove redundant swap constrained check upon isolation Kairui Song via B4 Relay
2026-04-23 17:43 ` [PATCH v6 09/14] mm/mglru: use the common routine for dirty/writeback reactivation Kairui Song via B4 Relay
2026-04-24 19:05 ` Kairui Song
2026-04-23 17:43 ` [PATCH v6 10/14] mm/mglru: simplify and improve dirty writeback handling Kairui Song via B4 Relay
2026-04-23 17:43 ` [PATCH v6 11/14] mm/mglru: remove no longer used reclaim argument for folio protection Kairui Song via B4 Relay
2026-04-23 17:43 ` [PATCH v6 12/14] mm/vmscan: remove sc->file_taken Kairui Song via B4 Relay
2026-04-23 17:43 ` [PATCH v6 13/14] mm/vmscan: remove sc->unqueued_dirty Kairui Song via B4 Relay
2026-04-23 17:43 ` [PATCH v6 14/14] mm/vmscan: unify writeback reclaim statistic and throttling Kairui Song via B4 Relay
2026-04-23 18:14 ` [PATCH v6 00/14] mm/mglru: improve reclaim loop and dirty folio handling Andrew Morton
2026-04-24 10:32 ` Barry Song
2026-04-24 11:58 ` Barry Song
2026-04-24 12:55 ` Kairui Song
2026-04-25 12:18 ` Barry Song
2026-04-25 13:29 ` Kairui Song
2026-04-25 20:57 ` Barry Song (Xiaomi)
2026-04-26 6:59 ` Kairui Song
2026-04-26 8:34 ` Barry Song
2026-04-24 13:36 ` Andrew Morton
2026-04-24 14:16 ` Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aet1hd9DfRH4aSOO@KASONG-MC4 \
--to=ryncsn@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=chenridong@huaweicloud.com \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kaleshsingh@google.com \
--cc=kasong@tencent.com \
--cc=laoar.shao@gmail.com \
--cc=lenohou@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@kernel.org \
--cc=qi.zheng@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=stevensd@google.com \
--cc=surenb@google.com \
--cc=vernon2gm@gmail.com \
--cc=wangzicheng@honor.com \
--cc=weixugc@google.com \
--cc=yuanchu@google.com \
--cc=yuzhao@google.com \
--cc=zhengqi.arch@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox