All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Minchan Kim <minchan@kernel.org>
Cc: akpm@linux-foundation.org, david@kernel.org, brauner@kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	surenb@google.com, timmurray@google.com
Subject: Re: [RFC 0/3]  mm: process_mrelease: expedited reclaim and auto-kill support
Date: Thu, 16 Apr 2026 08:54:53 +0200	[thread overview]
Message-ID: <aeCHvYOKP1exBBns@tiehlicka> (raw)
In-Reply-To: <aeAeqoJ_acUAza8D@google.com>

On Wed 15-04-26 16:26:34, Minchan Kim wrote:
> On Wed, Apr 15, 2026 at 09:38:05AM +0200, Michal Hocko wrote:
> > On Tue 14-04-26 13:00:16, Minchan Kim wrote:
> > > On Tue, Apr 14, 2026 at 08:57:57AM +0200, Michal Hocko wrote:
> > > > On Mon 13-04-26 15:39:45, Minchan Kim wrote:
> > > > > This patch series introduces optimizations to expedite memory reclamation
> > > > > in process_mrelease() and provides a secure, race-free "auto-kill"
> > > > > mechanism for efficient container shutdown and OOM handling.
> > > > > 
> > > > > Currently, process_mrelease() unmaps pages but leaves clean file folios
> > > > > on the LRU list, relying on standard memory reclaim to eventually free
> > > > > them. Furthermore, requiring userspace to send a SIGKILL prior to
> > > > > invoking process_mrelease() introduces scheduling race conditions where
> > > > > the victim task may enter the exit path prematurely, bypassing expedited
> > > > > reclamation hooks.
> > > > > 
> > > > > This series addresses these limitations in three logical steps.
> > > > > 
> > > > > Patch #1: mm: process_mrelease: expedite clean file folio reclaim via mmu_gather
> > > > > Integrates clean file folio eviction directly into the low-level TLB
> > > > > batching (mmu_gather) infrastructure. Symmetrically truncates clean file
> > > > > folios alongside anonymous pages during the unmap loop.
> > > > 
> > > > Why do we need to care about clean page cache? Is this a form of
> > > > drop_caches?
> > > 
> > > The goal is to ensure the memory is actually freed by the time
> > > process_mrelease returns. Currently, process_mrelease unmaps pages, but
> > > page caches remain on the LRU, leaving them to be reclaimed later
> > > by kswapd or direct reclaim.
> > 
> > Correct. This was the initial design decision because there is not much
> > you can assume about page cache pages which are very often shared. Even
> > if they are not mapped by all users.
> 
> Fair point. However, that's the trade-off:
> 
> Leaving unmapped caches to be reclaimed asynchronously keeps system memory
> pressure high for too long. In Android, this delay forces the LMKD to
> unnecessarily kill additional innocent background apps before the memory
> from the original victim is recovered.

OK, this is really not clear to me. How come you end up triggering LMKD
(or any OOM handling) when there is a considerable amount of clean page
cache?

[...]

> > > The race occurs when the victim process starts its own exit path (after
> > > SIGKILL) before the caller can invoke process_mrelease. If the victim
> > > reaches the exit path first, the caller might lose the window to apply
> > > these expedited reclamation optimizations.
> > 
> > Isn't this the problem you are trying to solve then? You are special
> > casing process_mrelease while you really want to expedite the process
> > memory clean up. 
> > 
> > The same situation happens with the global OOM and your approach doesn't
> > really close the race anyway. You send SIGKILL first and the victim can
> > hit the exit path right after that before you start processing the rest.
> > That is not fundamentally different from doing that in two syscalls,
> > race window is just smaller.
> 
> No, this approach completely close the race.
> 
> When it invokes do_send_sig_info(SIGKILL) with the KILL_MRELEASE code,
> the kernel sets the MMF_UNSTABLE flag on the victim's mm_struct in the signal
> delivery path (kernel/signal.c) *before* the task begins processing the signal.

OK, I have missed this part. I haven't really looked into specific
patches at this stage. I am still trying to understand the motivation
and your reasoning. So effectivelly you want to get SIGOOMKILL more or
less.

> When the victim gets scheduled and wakes up to process the fatal signal,
> the MMF_UNSTABLE flag is already set.
> 
> This guarantees that the victim's own exit path (do_exit -> exit_mmap) will
> utilize the expedited reclamation optimizations automatically, regardless of
> whether the reaper or the victim gets scheduled first.
> 
> For the OOM, we can use the same idea.
> 
> > 
> > All that being said, I do not think those special hacks for
> > process_mrelease is the right approach. I very much agree that the
> > address space tear down for a dying process could be improved and we
> > should be focusing on that part.
> 
> I think process_mrelease is crucial here because relying on the exit path is
> non-deterministic.

I suspect you are missing my point. I am arguing that those special
hacks in the address space release path shouldn't be process_mrelease
specific. I do recognize the value of the sync tear down need. I am also
in favor of something like SIGOOMKILL. process_mrelease might even be
the right syscall for that purpose.
-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2026-04-16  6:55 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-13 22:39 [RFC 0/3] mm: process_mrelease: expedited reclaim and auto-kill support Minchan Kim
2026-04-13 22:39 ` [RFC 1/3] mm: process_mrelease: expedite clean file folio reclaim via mmu_gather Minchan Kim
2026-04-14  7:45   ` David Hildenbrand (Arm)
2026-04-14 20:21     ` Minchan Kim
2026-04-13 22:39 ` [RFC 2/3] mm: process_mrelease: skip LRU movement for exclusive file folios Minchan Kim
2026-04-14  7:20   ` David Hildenbrand (Arm)
2026-04-14 20:22     ` Minchan Kim
2026-04-13 22:39 ` [RFC 3/3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag Minchan Kim
2026-04-16  9:13   ` Christian Brauner
2026-04-17  6:30     ` Minchan Kim
2026-04-17  7:04       ` Michal Hocko
2026-04-20 21:47         ` Minchan Kim
2026-04-23  7:17           ` Michal Hocko
2026-04-23 23:43             ` Minchan Kim
2026-04-24  7:38               ` Michal Hocko
2026-04-14  6:57 ` [RFC 0/3] mm: process_mrelease: expedited reclaim and auto-kill support Michal Hocko
2026-04-14 20:00   ` Minchan Kim
2026-04-15  7:38     ` Michal Hocko
2026-04-15 23:26       ` Minchan Kim
2026-04-16  6:54         ` Michal Hocko [this message]
2026-04-17  6:20           ` Minchan Kim
2026-04-17  7:11             ` Michal Hocko
2026-04-20 21:53               ` Minchan Kim
2026-04-23  7:50                 ` Michal Hocko
2026-04-23  9:49                   ` David Hildenbrand (Arm)
2026-04-23 22:36                     ` Suren Baghdasaryan
2026-04-24  0:08                       ` Minchan Kim
2026-04-24  7:40                       ` Michal Hocko
2026-04-24  7:41                       ` David Hildenbrand (Arm)
2026-04-27 16:14                         ` Suren Baghdasaryan
2026-04-23 23:58                   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aeCHvYOKP1exBBns@tiehlicka \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.