public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Suren Baghdasaryan <surenb@google.com>
Cc: akpm@linux-foundation.org, hca@linux.ibm.com,
	linux-s390@vger.kernel.org, david@kernel.org, mhocko@suse.com,
	brauner@kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, timmurray@google.com
Subject: Re: [PATCH v1 3/3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
Date: Mon, 27 Apr 2026 15:52:42 -0700	[thread overview]
Message-ID: <ae_oul_xgD5Kgnqi@google.com> (raw)
In-Reply-To: <CAJuCfpFK+qatN_vHArEKOYe+LRHQG0e3inJrzVdz-mABi-BSTg@mail.gmail.com>

On Mon, Apr 27, 2026 at 01:34:37PM -0700, Suren Baghdasaryan wrote:
> On Tue, Apr 21, 2026 at 4:03 PM Minchan Kim <minchan@kernel.org> wrote:
> >
> > Currently, process_mrelease() requires userspace to send a SIGKILL signal
> > prior to the call. This separation introduces a scheduling race window
> > where the victim task may receive the signal and enter the exit path
> > before the reaper can invoke process_mrelease().
> >
> > When the victim enters the exit path (do_exit -> exit_mm), it clears its
> > task->mm immediately. This causes process_mrelease() to fail with -ESRCH,
> > leaving the actual address space teardown (exit_mmap) to be deferred until
> > the mm's reference count drops to zero. In Android, arbitrary reference counts
> > (e.g., async I/O, reading /proc/<pid>/cmdline, or various other remote
> > VM accesses) frequently delay this teardown indefinitely, defeating the
> > purpose of expedited reclamation.
> >
> > This delay keeps memory pressure high, forcing the system to unnecessarily
> > kill additional innocent background apps before the memory from the first
> > victim is recovered.
> >
> > This patch introduces the PROCESS_MRELEASE_REAP_KILL UAPI flag to support
> > an integrated auto-kill mode. When specified, process_mrelease() directly
> > injects a SIGKILL into the target task.
> >
> > To solve the race condition deterministically, we grab the mm reference
> > via mmget() and set the MMF_UNSTABLE flag *before* sending the SIGKILL.
> > Using mmget() instead of mmgrab() keeps mm_users > 0, preventing the
> > victim from calling exit_mmap() in its own exit path. This ensures that
> > the memory is reclaimed synchronously and deterministically by the reaper
> > in the context of process_mrelease(), avoiding delays caused by
> > non-deterministic scheduling of the victim task.
> >
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  include/uapi/linux/mman.h |  4 +++
> >  mm/oom_kill.c             | 56 +++++++++++++++++++++++++++------------
> >  2 files changed, 43 insertions(+), 17 deletions(-)
> >
> > diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
> > index e89d00528f2f..4266976b45ad 100644
> > --- a/include/uapi/linux/mman.h
> > +++ b/include/uapi/linux/mman.h
> > @@ -56,4 +56,8 @@ struct cachestat {
> >         __u64 nr_recently_evicted;
> >  };
> >
> > +/* Flags for process_mrelease */
> > +#define PROCESS_MRELEASE_REAP_KILL     (1 << 0)
> > +#define PROCESS_MRELEASE_VALID_FLAGS   (PROCESS_MRELEASE_REAP_KILL)
> > +
> >  #endif /* _UAPI_LINUX_MMAN_H */
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 5c6c95c169ee..730ba0d19b53 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -20,6 +20,7 @@
> >
> >  #include <linux/oom.h>
> >  #include <linux/mm.h>
> > +#include <uapi/linux/mman.h>
> >  #include <linux/err.h>
> >  #include <linux/gfp.h>
> >  #include <linux/sched.h>
> > @@ -850,7 +851,7 @@ bool oom_killer_disable(signed long timeout)
> >         return true;
> >  }
> >
> > -static inline bool __task_will_free_mem(struct task_struct *task)
> > +static inline bool __task_will_free_mem(struct task_struct *task, bool ignore_exit)
> >  {
> >         struct signal_struct *sig = task->signal;
> >
> > @@ -862,6 +863,9 @@ static inline bool __task_will_free_mem(struct task_struct *task)
> >         if (sig->core_state)
> >                 return false;
> >
> > +       if (ignore_exit)
> > +               return true;
> > +
> >         if (sig->flags & SIGNAL_GROUP_EXIT)
> >                 return true;
> >
> > @@ -878,7 +882,7 @@ static inline bool __task_will_free_mem(struct task_struct *task)
> >   * Caller has to make sure that task->mm is stable (hold task_lock or
> >   * it operates on the current).
> >   */
> > -static bool task_will_free_mem(struct task_struct *task)
> > +static bool task_will_free_mem(struct task_struct *task, bool ignore_exit)
> >  {
> >         struct mm_struct *mm = task->mm;
> >         struct task_struct *p;
> > @@ -892,7 +896,7 @@ static bool task_will_free_mem(struct task_struct *task)
> >         if (!mm)
> >                 return false;
> >
> > -       if (!__task_will_free_mem(task))
> > +       if (!__task_will_free_mem(task, ignore_exit))
> >                 return false;
> >
> >         /*
> > @@ -916,7 +920,7 @@ static bool task_will_free_mem(struct task_struct *task)
> >                         continue;
> >                 if (same_thread_group(task, p))
> >                         continue;
> > -               ret = __task_will_free_mem(p);
> > +               ret = __task_will_free_mem(p, false);
> >                 if (!ret)
> >                         break;
> >         }
> > @@ -1034,7 +1038,7 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
> >          * so it can die quickly
> >          */
> >         task_lock(victim);
> > -       if (task_will_free_mem(victim)) {
> > +       if (task_will_free_mem(victim, false)) {
> >                 mark_oom_victim(victim);
> >                 queue_oom_reaper(victim);
> >                 task_unlock(victim);
> > @@ -1135,7 +1139,7 @@ bool out_of_memory(struct oom_control *oc)
> >          * select it.  The goal is to allow it to allocate so that it may
> >          * quickly exit and free its memory.
> >          */
> > -       if (task_will_free_mem(current)) {
> > +       if (task_will_free_mem(current, false)) {
> >                 mark_oom_victim(current);
> >                 queue_oom_reaper(current);
> >                 return true;
> > @@ -1217,8 +1221,9 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
> >         unsigned int f_flags;
> >         bool reap = false;
> >         long ret = 0;
> > +       bool reap_kill;
> >
> > -       if (flags)
> > +       if (flags & ~PROCESS_MRELEASE_VALID_FLAGS)
> >                 return -EINVAL;
> >
> >         task = pidfd_get_task(pidfd, &f_flags);
> > @@ -1236,19 +1241,33 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
> >         }
> >
> >         mm = p->mm;
> > -       mmgrab(mm);
> >
> > -       if (task_will_free_mem(p))
> > -               reap = true;
> > -       else {
> > -               /* Error only if the work has not been done already */
> > -               if (!mm_flags_test(MMF_OOM_SKIP, mm))
> > +       reap_kill = !!(flags & PROCESS_MRELEASE_REAP_KILL);
> > +       reap = task_will_free_mem(p, reap_kill);
> > +       if (!reap) {
> > +               if (reap_kill || !mm_flags_test(MMF_OOM_SKIP, mm))
> >                         ret = -EINVAL;
> > +
> > +               task_unlock(p);
> > +               goto put_task;
> >         }
> > -       task_unlock(p);
> >
> > -       if (!reap)
> > -               goto drop_mm;
> > +       if (reap_kill) {
> > +               /*
> > +                * We use mmget() instead of mmgrab() to keep mm_users > 0,
> > +                * preventing the victim from calling exit_mmap() in its
> > +                * own exit path. This ensures that the memory is reclaimed
> > +                * synchronously and deterministically by the reaper.
> > +                */
> > +               mmget(mm);
> 
> I don't quite understand why you need to mmget() and prevent the
> victim from calling exit_mmap() here. As long as we successfully
> mmgrab'ed the mm, we can safely proceed with mmap locking and doing
> __oom_reap_task_mm(). Victim can execute exit_mmap() in parallel with
> __oom_reap_task_mm(). In fact that's even desirable! I remember a
> recent patch that used mas_for_each_rev() in __oom_reap_task_mm() to
> reclaim memory in reverse order so that exit_mmap() and
> __oom_reap_task_mm() can start from opposite ends of the address space
> and converge in the middle, working concurrently.

Good information. Will change it in next respin.


      reply	other threads:[~2026-04-27 22:52 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21 23:02 [PATCH v1 0/3] mm: process_mrelease: expedite clean file folio reclaim and add auto-kill Minchan Kim
2026-04-21 23:02 ` [PATCH v1 1/3] mm: process_mrelease: expedite clean file folio reclaim via mmu_gather Minchan Kim
2026-04-24  7:56   ` David Hildenbrand (Arm)
2026-04-24 21:24     ` Minchan Kim
2026-04-27  9:29       ` David Hildenbrand (Arm)
2026-04-27 22:04         ` Minchan Kim
2026-04-24 19:33   ` Matthew Wilcox
2026-04-24 21:56     ` Minchan Kim
2026-04-21 23:02 ` [PATCH v1 2/3] mm: process_mrelease: skip LRU movement for exclusive file folios Minchan Kim
2026-04-22  7:22   ` Baolin Wang
2026-04-23 23:38     ` Minchan Kim
2026-04-24  7:51   ` Michal Hocko
2026-04-24  7:57     ` David Hildenbrand (Arm)
2026-04-24 19:15       ` Minchan Kim
2026-04-27  7:16         ` Michal Hocko
2026-04-27 16:48           ` Suren Baghdasaryan
2026-04-27 17:15             ` Michal Hocko
2026-04-27 23:05               ` Minchan Kim
2026-04-28  6:56                 ` Michal Hocko
2026-04-24 19:26     ` Minchan Kim
2026-04-21 23:02 ` [PATCH v1 3/3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag Minchan Kim
2026-04-24  7:57   ` Michal Hocko
2026-04-24 22:49     ` Minchan Kim
2026-04-27  7:02       ` Michal Hocko
2026-04-27 22:03         ` Minchan Kim
2026-04-28  7:01           ` Michal Hocko
2026-04-27 20:34   ` Suren Baghdasaryan
2026-04-27 22:52     ` Minchan Kim [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae_oul_xgD5Kgnqi@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=david@kernel.org \
    --cc=hca@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox