public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Kairui Song <ryncsn@gmail.com>
To: Eric Naim <dnaim@cachyos.org>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	Yuanchu Xie <yuanchu@google.com>,  Wei Xu <weixugc@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 David Hildenbrand <david@kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	 Qi Zheng <zhengqi.arch@bytedance.com>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	 Lorenzo Stoakes <ljs@kernel.org>, Barry Song <baohua@kernel.org>,
	David Stevens <stevensd@google.com>,
	 Chen Ridong <chenridong@huaweicloud.com>,
	Leno Hou <lenohou@gmail.com>,  Yafang Shao <laoar.shao@gmail.com>,
	Yu Zhao <yuzhao@google.com>,
	 Zicheng Wang <wangzicheng@honor.com>,
	Kalesh Singh <kaleshsingh@google.com>,
	 Suren Baghdasaryan <surenb@google.com>,
	Chris Li <chrisl@kernel.org>, Vernon Yang <vernon2gm@gmail.com>,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/8] mm/mglru: improve reclaim loop and dirty folio handling
Date: Wed, 25 Mar 2026 17:47:41 +0800	[thread overview]
Message-ID: <CAMgjq7AQeP8maeMWNun=60oyq_KDu18MwXfGEyK4bwj_k92NgQ@mail.gmail.com> (raw)
In-Reply-To: <85b4be3c-09a3-4a28-924d-71a20db3fd62@cachyos.org>

On Wed, Mar 25, 2026 at 5:27 PM Eric Naim <dnaim@cachyos.org> wrote:
>
> On 3/25/26 1:47 PM, Kairui Song wrote:
> > On Wed, Mar 25, 2026 at 1:04 PM Eric Naim <dnaim@cachyos.org> wrote:
> >>
> >> Hi Kairui,
> >>
> >> On 3/18/26 3:08 AM, Kairui Song via B4 Relay wrote:
> >>> This series cleans up and slightly improves MGLRU's reclaim loop and
> >>> dirty flush logic. As a result, we can see an up to ~50% reduce of file
> >>> faults and 30% increase in MongoDB throughput with YCSB and no swap
> >>> involved, other common benchmarks have no regression, and LOC is
> >>> reduced, with less unexpected OOM in our production environment.
> >>>
> >
> > ...
> >
> >>
> >> I applied this patch set to 7.0-rc5 and noticed the system locking up when performing the below test.
> >>
> >> fallocate -l 5G 5G
> >> while true; do tail /dev/zero; done
> >> while true; do time cat 5G > /dev/null; sleep $(($(cat /sys/kernel/mm/lru_gen/min_ttl_ms)/1000+1)); done
> >>
> >> After reading [1], I suspect that this was because the system was using zram as swap, and yes if zram is disabled then the lock up does not occur.
> >
> > Hi Eric,
> >
> > Thanks for the report, I was about to send V2 but noticing your report
> > I'll try to reproduce your issue first.
> >
> > So far I didn't notice any regression, is this an issue caused by this
> > patch or is it an existing issue? I don't have any context about how
> > you are doing the test. BTW the calculation in patch "mm/mglru:
> > restructure the reclaim loop" needs to have a lowest bar
> > "max(nr_to_scan, SWAP_CLUSTER_MAX)" for small machines, not sure if
> > related but will add to V2.
> >
>
> As of writing this, I got some new information that makes this a bit more confusing. The kernel that doesn't have the issue was patched with [1] as a means of protecting the working set (similar to lru_gen_min_ttl_ms).
>
> So this time on an unpatched kernel, the system still freezes but quickly recovers itself after about 2 seconds. With this patchset applied, the system freezes but it doesn't quickly recover (if at all).
>
> Curiously, I had the user test again but this time with lru_gen_min_ttl_ms = 100. With this set, the system doesn't freeze at all with or without this patchset.

Ah thanks, that makes sense now, the downstream patch you mentioned
limits the reclaim of file pages to avoid thrashing, and your test
cases exhaust the memory on purpose which forces the kernel to reclaim
all reclaimable folios including page cache.

A thrashing page cache causes desktop hangs easily, using TTL is an
effective way to avoid thrashing and trigger OOM early. That's why the
problem is gone with lru_gen_min_ttl_ms = 100 or le9.

> > And about the test you posted:
> > while true; do tail /dev/zero; done
> >
> > I believe this will just consume all memory with zero pages and then
> > get OOM killed, that's exactly what the test is meant to do. By lockup
> > I'm not sure you mean since you mentioned OOM kill. The system
> > actually hung or the desktop is dead?
>
> The system actually hung. They needed a hard reset to recover the system. (pure speculation: given a few minutes the system would likely recover itself as this seems to be a common scenario)

Yeah I believe so.

Thrashing prevention is why MGLRU's TTL is introduced, so I do suggest
using that. It can be further improved too.

Will keep that in mind and try to make some test cases to cover your
case too and make some adjustments.

BTW how does the kernel behave with MGLRU disabled for your case?


      reply	other threads:[~2026-03-25  9:48 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 19:08 [PATCH 0/8] mm/mglru: improve reclaim loop and dirty folio handling Kairui Song via B4 Relay
2026-03-17 19:08 ` [PATCH 1/8] mm/mglru: consolidate common code for retrieving evitable size Kairui Song via B4 Relay
2026-03-17 19:55   ` Yuanchu Xie
2026-03-18  9:42   ` Barry Song
2026-03-18  9:57     ` Kairui Song
2026-03-19  1:40   ` Chen Ridong
2026-03-20 19:51     ` Axel Rasmussen
2026-03-22 16:10       ` Kairui Song
2026-03-26  6:25   ` Baolin Wang
2026-03-17 19:08 ` [PATCH 2/8] mm/mglru: relocate the LRU scan batch limit to callers Kairui Song via B4 Relay
2026-03-19  2:00   ` Chen Ridong
2026-03-19  4:12     ` Kairui Song
2026-03-20 21:00   ` Axel Rasmussen
2026-03-22  8:14   ` Barry Song
2026-03-24  6:05     ` Kairui Song
2026-03-17 19:08 ` [PATCH 3/8] mm/mglru: restructure the reclaim loop Kairui Song via B4 Relay
2026-03-20 20:09   ` Axel Rasmussen
2026-03-22 16:11     ` Kairui Song
2026-03-24  6:41   ` Chen Ridong
2026-03-26  7:31   ` Baolin Wang
2026-03-26  8:37     ` Kairui Song
2026-03-17 19:09 ` [PATCH 4/8] mm/mglru: scan and count the exact number of folios Kairui Song via B4 Relay
2026-03-20 20:57   ` Axel Rasmussen
2026-03-22 16:20     ` Kairui Song
2026-03-24  7:22       ` Chen Ridong
2026-03-24  8:05         ` Kairui Song
2026-03-24  9:10           ` Chen Ridong
2026-03-24  9:29             ` Kairui Song
2026-03-17 19:09 ` [PATCH 5/8] mm/mglru: use a smaller batch for reclaim Kairui Song via B4 Relay
2026-03-20 20:58   ` Axel Rasmussen
2026-03-24  7:51   ` Chen Ridong
2026-03-17 19:09 ` [PATCH 6/8] mm/mglru: don't abort scan immediately right after aging Kairui Song via B4 Relay
2026-03-17 19:09 ` [PATCH 7/8] mm/mglru: simplify and improve dirty writeback handling Kairui Song via B4 Relay
2026-03-20 21:18   ` Axel Rasmussen
2026-03-22 16:22     ` Kairui Song
2026-03-24  8:57   ` Chen Ridong
2026-03-24 11:09     ` Kairui Song
2026-03-26  7:56   ` Baolin Wang
2026-03-17 19:09 ` [PATCH 8/8] mm/vmscan: remove sc->file_taken Kairui Song via B4 Relay
2026-03-20 21:19   ` Axel Rasmussen
2026-03-25  4:49 ` [PATCH 0/8] mm/mglru: improve reclaim loop and dirty folio handling Eric Naim
2026-03-25  5:47   ` Kairui Song
2026-03-25  9:26     ` Eric Naim
2026-03-25  9:47       ` Kairui Song [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMgjq7AQeP8maeMWNun=60oyq_KDu18MwXfGEyK4bwj_k92NgQ@mail.gmail.com' \
    --to=ryncsn@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=baohua@kernel.org \
    --cc=chenridong@huaweicloud.com \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=dnaim@cachyos.org \
    --cc=hannes@cmpxchg.org \
    --cc=kaleshsingh@google.com \
    --cc=laoar.shao@gmail.com \
    --cc=lenohou@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=stevensd@google.com \
    --cc=surenb@google.com \
    --cc=vernon2gm@gmail.com \
    --cc=wangzicheng@honor.com \
    --cc=weixugc@google.com \
    --cc=yuanchu@google.com \
    --cc=yuzhao@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox