From: Michal Hocko <mhocko@suse.com>
To: Minchan Kim <minchan@kernel.org>
Cc: akpm@linux-foundation.org, david@kernel.org, brauner@kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
surenb@google.com, timmurray@google.com
Subject: Re: [RFC 0/3] mm: process_mrelease: expedited reclaim and auto-kill support
Date: Thu, 16 Apr 2026 08:54:53 +0200 [thread overview]
Message-ID: <aeCHvYOKP1exBBns@tiehlicka> (raw)
In-Reply-To: <aeAeqoJ_acUAza8D@google.com>
On Wed 15-04-26 16:26:34, Minchan Kim wrote:
> On Wed, Apr 15, 2026 at 09:38:05AM +0200, Michal Hocko wrote:
> > On Tue 14-04-26 13:00:16, Minchan Kim wrote:
> > > On Tue, Apr 14, 2026 at 08:57:57AM +0200, Michal Hocko wrote:
> > > > On Mon 13-04-26 15:39:45, Minchan Kim wrote:
> > > > > This patch series introduces optimizations to expedite memory reclamation
> > > > > in process_mrelease() and provides a secure, race-free "auto-kill"
> > > > > mechanism for efficient container shutdown and OOM handling.
> > > > >
> > > > > Currently, process_mrelease() unmaps pages but leaves clean file folios
> > > > > on the LRU list, relying on standard memory reclaim to eventually free
> > > > > them. Furthermore, requiring userspace to send a SIGKILL prior to
> > > > > invoking process_mrelease() introduces scheduling race conditions where
> > > > > the victim task may enter the exit path prematurely, bypassing expedited
> > > > > reclamation hooks.
> > > > >
> > > > > This series addresses these limitations in three logical steps.
> > > > >
> > > > > Patch #1: mm: process_mrelease: expedite clean file folio reclaim via mmu_gather
> > > > > Integrates clean file folio eviction directly into the low-level TLB
> > > > > batching (mmu_gather) infrastructure. Symmetrically truncates clean file
> > > > > folios alongside anonymous pages during the unmap loop.
> > > >
> > > > Why do we need to care about clean page cache? Is this a form of
> > > > drop_caches?
> > >
> > > The goal is to ensure the memory is actually freed by the time
> > > process_mrelease returns. Currently, process_mrelease unmaps pages, but
> > > page caches remain on the LRU, leaving them to be reclaimed later
> > > by kswapd or direct reclaim.
> >
> > Correct. This was the initial design decision because there is not much
> > you can assume about page cache pages which are very often shared. Even
> > if they are not mapped by all users.
>
> Fair point. However, that's the trade-off:
>
> Leaving unmapped caches to be reclaimed asynchronously keeps system memory
> pressure high for too long. In Android, this delay forces the LMKD to
> unnecessarily kill additional innocent background apps before the memory
> from the original victim is recovered.
OK, this is really not clear to me. How come you end up triggering LMKD
(or any OOM handling) when there is a considerable amount of clean page
cache?
[...]
> > > The race occurs when the victim process starts its own exit path (after
> > > SIGKILL) before the caller can invoke process_mrelease. If the victim
> > > reaches the exit path first, the caller might lose the window to apply
> > > these expedited reclamation optimizations.
> >
> > Isn't this the problem you are trying to solve then? You are special
> > casing process_mrelease while you really want to expedite the process
> > memory clean up.
> >
> > The same situation happens with the global OOM and your approach doesn't
> > really close the race anyway. You send SIGKILL first and the victim can
> > hit the exit path right after that before you start processing the rest.
> > That is not fundamentally different from doing that in two syscalls,
> > race window is just smaller.
>
> No, this approach completely close the race.
>
> When it invokes do_send_sig_info(SIGKILL) with the KILL_MRELEASE code,
> the kernel sets the MMF_UNSTABLE flag on the victim's mm_struct in the signal
> delivery path (kernel/signal.c) *before* the task begins processing the signal.
OK, I have missed this part. I haven't really looked into specific
patches at this stage. I am still trying to understand the motivation
and your reasoning. So effectivelly you want to get SIGOOMKILL more or
less.
> When the victim gets scheduled and wakes up to process the fatal signal,
> the MMF_UNSTABLE flag is already set.
>
> This guarantees that the victim's own exit path (do_exit -> exit_mmap) will
> utilize the expedited reclamation optimizations automatically, regardless of
> whether the reaper or the victim gets scheduled first.
>
> For the OOM, we can use the same idea.
>
> >
> > All that being said, I do not think those special hacks for
> > process_mrelease is the right approach. I very much agree that the
> > address space tear down for a dying process could be improved and we
> > should be focusing on that part.
>
> I think process_mrelease is crucial here because relying on the exit path is
> non-deterministic.
I suspect you are missing my point. I am arguing that those special
hacks in the address space release path shouldn't be process_mrelease
specific. I do recognize the value of the sync tear down need. I am also
in favor of something like SIGOOMKILL. process_mrelease might even be
the right syscall for that purpose.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2026-04-16 6:54 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-13 22:39 [RFC 0/3] mm: process_mrelease: expedited reclaim and auto-kill support Minchan Kim
2026-04-13 22:39 ` [RFC 1/3] mm: process_mrelease: expedite clean file folio reclaim via mmu_gather Minchan Kim
2026-04-14 7:45 ` David Hildenbrand (Arm)
2026-04-14 20:21 ` Minchan Kim
2026-04-13 22:39 ` [RFC 2/3] mm: process_mrelease: skip LRU movement for exclusive file folios Minchan Kim
2026-04-14 7:20 ` David Hildenbrand (Arm)
2026-04-14 20:22 ` Minchan Kim
2026-04-13 22:39 ` [RFC 3/3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag Minchan Kim
2026-04-16 9:13 ` Christian Brauner
2026-04-17 6:30 ` Minchan Kim
2026-04-17 7:04 ` Michal Hocko
2026-04-14 6:57 ` [RFC 0/3] mm: process_mrelease: expedited reclaim and auto-kill support Michal Hocko
2026-04-14 20:00 ` Minchan Kim
2026-04-15 7:38 ` Michal Hocko
2026-04-15 23:26 ` Minchan Kim
2026-04-16 6:54 ` Michal Hocko [this message]
2026-04-17 6:20 ` Minchan Kim
2026-04-17 7:11 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeCHvYOKP1exBBns@tiehlicka \
--to=mhocko@suse.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=david@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=surenb@google.com \
--cc=timmurray@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox