public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Minchan Kim <minchan@kernel.org>
Cc: akpm@linux-foundation.org, david@kernel.org, brauner@kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	surenb@google.com, timmurray@google.com
Subject: Re: [RFC 0/3]  mm: process_mrelease: expedited reclaim and auto-kill support
Date: Thu, 16 Apr 2026 08:54:53 +0200	[thread overview]
Message-ID: <aeCHvYOKP1exBBns@tiehlicka> (raw)
In-Reply-To: <aeAeqoJ_acUAza8D@google.com>

On Wed 15-04-26 16:26:34, Minchan Kim wrote:
> On Wed, Apr 15, 2026 at 09:38:05AM +0200, Michal Hocko wrote:
> > On Tue 14-04-26 13:00:16, Minchan Kim wrote:
> > > On Tue, Apr 14, 2026 at 08:57:57AM +0200, Michal Hocko wrote:
> > > > On Mon 13-04-26 15:39:45, Minchan Kim wrote:
> > > > > This patch series introduces optimizations to expedite memory reclamation
> > > > > in process_mrelease() and provides a secure, race-free "auto-kill"
> > > > > mechanism for efficient container shutdown and OOM handling.
> > > > > 
> > > > > Currently, process_mrelease() unmaps pages but leaves clean file folios
> > > > > on the LRU list, relying on standard memory reclaim to eventually free
> > > > > them. Furthermore, requiring userspace to send a SIGKILL prior to
> > > > > invoking process_mrelease() introduces scheduling race conditions where
> > > > > the victim task may enter the exit path prematurely, bypassing expedited
> > > > > reclamation hooks.
> > > > > 
> > > > > This series addresses these limitations in three logical steps.
> > > > > 
> > > > > Patch #1: mm: process_mrelease: expedite clean file folio reclaim via mmu_gather
> > > > > Integrates clean file folio eviction directly into the low-level TLB
> > > > > batching (mmu_gather) infrastructure. Symmetrically truncates clean file
> > > > > folios alongside anonymous pages during the unmap loop.
> > > > 
> > > > Why do we need to care about clean page cache? Is this a form of
> > > > drop_caches?
> > > 
> > > The goal is to ensure the memory is actually freed by the time
> > > process_mrelease returns. Currently, process_mrelease unmaps pages, but
> > > page caches remain on the LRU, leaving them to be reclaimed later
> > > by kswapd or direct reclaim.
> > 
> > Correct. This was the initial design decision because there is not much
> > you can assume about page cache pages which are very often shared. Even
> > if they are not mapped by all users.
> 
> Fair point. However, that's the trade-off:
> 
> Leaving unmapped caches to be reclaimed asynchronously keeps system memory
> pressure high for too long. In Android, this delay forces the LMKD to
> unnecessarily kill additional innocent background apps before the memory
> from the original victim is recovered.

OK, this is really not clear to me. How come you end up triggering LMKD
(or any OOM handling) when there is a considerable amount of clean page
cache?

[...]

> > > The race occurs when the victim process starts its own exit path (after
> > > SIGKILL) before the caller can invoke process_mrelease. If the victim
> > > reaches the exit path first, the caller might lose the window to apply
> > > these expedited reclamation optimizations.
> > 
> > Isn't this the problem you are trying to solve then? You are special
> > casing process_mrelease while you really want to expedite the process
> > memory clean up. 
> > 
> > The same situation happens with the global OOM and your approach doesn't
> > really close the race anyway. You send SIGKILL first and the victim can
> > hit the exit path right after that before you start processing the rest.
> > That is not fundamentally different from doing that in two syscalls,
> > race window is just smaller.
> 
> No, this approach completely close the race.
> 
> When it invokes do_send_sig_info(SIGKILL) with the KILL_MRELEASE code,
> the kernel sets the MMF_UNSTABLE flag on the victim's mm_struct in the signal
> delivery path (kernel/signal.c) *before* the task begins processing the signal.

OK, I have missed this part. I haven't really looked into specific
patches at this stage. I am still trying to understand the motivation
and your reasoning. So effectivelly you want to get SIGOOMKILL more or
less.

> When the victim gets scheduled and wakes up to process the fatal signal,
> the MMF_UNSTABLE flag is already set.
> 
> This guarantees that the victim's own exit path (do_exit -> exit_mmap) will
> utilize the expedited reclamation optimizations automatically, regardless of
> whether the reaper or the victim gets scheduled first.
> 
> For the OOM, we can use the same idea.
> 
> > 
> > All that being said, I do not think those special hacks for
> > process_mrelease is the right approach. I very much agree that the
> > address space tear down for a dying process could be improved and we
> > should be focusing on that part.
> 
> I think process_mrelease is crucial here because relying on the exit path is
> non-deterministic.

I suspect you are missing my point. I am arguing that those special
hacks in the address space release path shouldn't be process_mrelease
specific. I do recognize the value of the sync tear down need. I am also
in favor of something like SIGOOMKILL. process_mrelease might even be
the right syscall for that purpose.
-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2026-04-16  6:55 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-13 22:39 [RFC 0/3] mm: process_mrelease: expedited reclaim and auto-kill support Minchan Kim
2026-04-13 22:39 ` [RFC 1/3] mm: process_mrelease: expedite clean file folio reclaim via mmu_gather Minchan Kim
2026-04-14  7:45   ` David Hildenbrand (Arm)
2026-04-14 20:21     ` Minchan Kim
2026-04-13 22:39 ` [RFC 2/3] mm: process_mrelease: skip LRU movement for exclusive file folios Minchan Kim
2026-04-14  7:20   ` David Hildenbrand (Arm)
2026-04-14 20:22     ` Minchan Kim
2026-04-13 22:39 ` [RFC 3/3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag Minchan Kim
2026-04-16  9:13   ` Christian Brauner
2026-04-17  6:30     ` Minchan Kim
2026-04-17  7:04       ` Michal Hocko
2026-04-14  6:57 ` [RFC 0/3] mm: process_mrelease: expedited reclaim and auto-kill support Michal Hocko
2026-04-14 20:00   ` Minchan Kim
2026-04-15  7:38     ` Michal Hocko
2026-04-15 23:26       ` Minchan Kim
2026-04-16  6:54         ` Michal Hocko [this message]
2026-04-17  6:20           ` Minchan Kim
2026-04-17  7:11             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeCHvYOKP1exBBns@tiehlicka \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox