* [PATCH v2] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
@ 2026-04-29 21:13 Minchan Kim
2026-04-30 9:55 ` Michal Hocko
2026-04-30 14:43 ` Andrew Morton
0 siblings, 2 replies; 6+ messages in thread
From: Minchan Kim @ 2026-04-29 21:13 UTC (permalink / raw)
To: akpm
Cc: hca, linux-s390, david, mhocko, brauner, linux-mm, linux-kernel,
surenb, timmurray, Minchan Kim
Currently, process_mrelease() requires userspace to send a SIGKILL signal
prior to invocation. This separation introduces a scheduling race window
where the victim task may receive the signal and enter the exit path
before the reaper can invoke process_mrelease().
When the victim enters the exit path (do_exit -> exit_mm), it clears its
task->mm immediately. This causes process_mrelease() to fail with -ESRCH,
leaving the actual address space teardown (exit_mmap) to be deferred until
the mm's reference count drops to zero. In the field (e.g., Android),
arbitrary reference counts (reading /proc/<pid>/cmdline, or various other
remote VM accesses) frequently delay this teardown indefinitely,
defeating the purpose of expedited reclamation.
In Android's LMKD scenarios, this delay keeps memory pressure high, forcing
the system to unnecessarily kill additional innocent background apps before
the memory from the first victim is recovered.
This patch introduces the PROCESS_MRELEASE_REAP_KILL UAPI flag to support
an integrated auto-kill mode. When specified, process_mrelease() directly
injects a SIGKILL into the target task after finding its mm.
To solve the race condition, we grab the mm reference via mmgrab() before
sending the SIGKILL. If the user passed PROCESS_MRELEASE_REAP_KILL, we assume
it will free its memory and proceed with reaping, making the logic as simple
as reap = reap_kill || task_will_free_mem(p).
To handle shared address spaces safely in the auto-kill mode, we bail out
immediately if the mm is marked with MMF_MULTIPROCESS when
PROCESS_MRELEASE_REAP_KILL is specified. This protects existing users of
process_mrelease() from behavior changes while preventing unsafe reaping of
shared memory.
This policy differs from the global OOM killer, which kills all processes
sharing the same mm to guarantee memory reclamation at all costs (preventing
system hangs). However, process_mrelease() is invoked by userspace policy.
If it fails due to sharing, userspace can simply adapt and select another
victim process (such as another background app in Android case) to release
memory. We do not need to force success or affect processes that were not
targeted.
Fundamentally, this allows process_mrelease() to trigger targeted memory
reclaim (via oom_reaper infrastructure) quickly, even if the victim is
not yet in the exit path, while reusing existing race handling between
reaper and exit_mmap.
Suggested-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
include/uapi/linux/mman.h | 4 ++++
mm/oom_kill.c | 27 ++++++++++++++++++++-------
2 files changed, 24 insertions(+), 7 deletions(-)
diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
index e89d00528f2f..4266976b45ad 100644
--- a/include/uapi/linux/mman.h
+++ b/include/uapi/linux/mman.h
@@ -56,4 +56,8 @@ struct cachestat {
__u64 nr_recently_evicted;
};
+/* Flags for process_mrelease */
+#define PROCESS_MRELEASE_REAP_KILL (1 << 0)
+#define PROCESS_MRELEASE_VALID_FLAGS (PROCESS_MRELEASE_REAP_KILL)
+
#endif /* _UAPI_LINUX_MMAN_H */
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 5c6c95c169ee..efa6541b1c47 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -20,6 +20,7 @@
#include <linux/oom.h>
#include <linux/mm.h>
+#include <uapi/linux/mman.h>
#include <linux/err.h>
#include <linux/gfp.h>
#include <linux/sched.h>
@@ -1217,9 +1218,11 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
unsigned int f_flags;
bool reap = false;
long ret = 0;
+ bool reap_kill;
- if (flags)
+ if (flags & ~PROCESS_MRELEASE_VALID_FLAGS)
return -EINVAL;
+ reap_kill = !!(flags & PROCESS_MRELEASE_REAP_KILL);
task = pidfd_get_task(pidfd, &f_flags);
if (IS_ERR(task))
@@ -1236,19 +1239,29 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
}
mm = p->mm;
- mmgrab(mm);
+ if (reap_kill && mm_flags_test(MMF_MULTIPROCESS, mm)) {
+ ret = -EINVAL;
+ task_unlock(p);
+ goto put_task;
+ }
- if (task_will_free_mem(p))
- reap = true;
- else {
+ reap = reap_kill || task_will_free_mem(p);
+ if (!reap) {
/* Error only if the work has not been done already */
if (!mm_flags_test(MMF_OOM_SKIP, mm))
ret = -EINVAL;
+ task_unlock(p);
+ goto put_task;
}
+
+ mmgrab(mm);
task_unlock(p);
- if (!reap)
- goto drop_mm;
+ if (reap_kill) {
+ ret = kill_pid(task_tgid(task), SIGKILL, 0);
+ if (ret)
+ goto drop_mm;
+ }
if (mmap_read_lock_killable(mm)) {
ret = -EINTR;
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
2026-04-29 21:13 [PATCH v2] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag Minchan Kim
@ 2026-04-30 9:55 ` Michal Hocko
2026-04-30 14:43 ` Andrew Morton
1 sibling, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2026-04-30 9:55 UTC (permalink / raw)
To: Minchan Kim
Cc: akpm, hca, linux-s390, david, brauner, linux-mm, linux-kernel,
surenb, timmurray
On Wed 29-04-26 14:13:59, Minchan Kim wrote:
> This policy differs from the global OOM killer, which kills all processes
> sharing the same mm to guarantee memory reclamation at all costs (preventing
> system hangs).
Incorrect, we do the same for memcg OOM killer as well. This is not
about preventing system hands. But rather to
> However, process_mrelease() is invoked by userspace policy.
> If it fails due to sharing, userspace can simply adapt and select another
> victim process (such as another background app in Android case) to release
> memory. We do not need to force success or affect processes that were not
> targeted.
This is a wrong justification for the proposed semantic. You seem to be
assuming this is just fine rather than this would be problematic for
reasons a), b) and c). If there are no strong reasons _against_
following the global policy then we should stick with it. There are very
good reasons why we are doing that on the global level.
If for no other reasons then the proposed semantic severly criples the
shared MM case. You are left with a racy kill and call process_mrelease
approach. You certainly do not want to allow a simple way for tasks to
evade your LMK, do you? So just choose something else is a very bad
approach.
So unless you are aware of a specific reason(s) where collective kill is a
clearly an incorrect behavior then I believe the proper way is to kill
all processes sharing the mm (unless you are crossing any security
boundary when doing that).
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
2026-04-29 21:13 [PATCH v2] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag Minchan Kim
2026-04-30 9:55 ` Michal Hocko
@ 2026-04-30 14:43 ` Andrew Morton
2026-04-30 15:32 ` Michal Hocko
1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2026-04-30 14:43 UTC (permalink / raw)
To: Minchan Kim
Cc: hca, linux-s390, david, mhocko, brauner, linux-mm, linux-kernel,
surenb, timmurray
On Wed, 29 Apr 2026 14:13:59 -0700 Minchan Kim <minchan@kernel.org> wrote:
> Currently, process_mrelease() requires userspace to send a SIGKILL signal
> prior to invocation. This separation introduces a scheduling race window
> where the victim task may receive the signal and enter the exit path
> before the reaper can invoke process_mrelease().
Does process_mrelease() have a manpage? My googling was a fail.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
2026-04-30 14:43 ` Andrew Morton
@ 2026-04-30 15:32 ` Michal Hocko
2026-04-30 16:34 ` Andrew Morton
0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2026-04-30 15:32 UTC (permalink / raw)
To: Andrew Morton
Cc: Minchan Kim, hca, linux-s390, david, brauner, linux-mm,
linux-kernel, surenb, timmurray
On Thu 30-04-26 07:43:05, Andrew Morton wrote:
> On Wed, 29 Apr 2026 14:13:59 -0700 Minchan Kim <minchan@kernel.org> wrote:
>
> > Currently, process_mrelease() requires userspace to send a SIGKILL signal
> > prior to invocation. This separation introduces a scheduling race window
> > where the victim task may receive the signal and enter the exit path
> > before the reaper can invoke process_mrelease().
>
> Does process_mrelease() have a manpage? My googling was a fail.
It does. Very well hidden in 884a7e5964e06
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
2026-04-30 15:32 ` Michal Hocko
@ 2026-04-30 16:34 ` Andrew Morton
2026-04-30 17:24 ` Suren Baghdasaryan
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2026-04-30 16:34 UTC (permalink / raw)
To: Michal Hocko
Cc: Minchan Kim, hca, linux-s390, david, brauner, linux-mm,
linux-kernel, surenb, timmurray
On Thu, 30 Apr 2026 17:32:40 +0200 Michal Hocko <mhocko@suse.com> wrote:
> On Thu 30-04-26 07:43:05, Andrew Morton wrote:
> > On Wed, 29 Apr 2026 14:13:59 -0700 Minchan Kim <minchan@kernel.org> wrote:
> >
> > > Currently, process_mrelease() requires userspace to send a SIGKILL signal
> > > prior to invocation. This separation introduces a scheduling race window
> > > where the victim task may receive the signal and enter the exit path
> > > before the reaper can invoke process_mrelease().
> >
> > Does process_mrelease() have a manpage? My googling was a fail.
>
> It does. Very well hidden in 884a7e5964e06
Well, that didn't appear to make it into the manpages project and it
doesn't describe the expected usage: need to kill the process first.
But I guess all the needed info is in
tools/testing/selftests/mm/mrelease_test.c.
https://lwn.net/Articles/864184/ is useful.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
2026-04-30 16:34 ` Andrew Morton
@ 2026-04-30 17:24 ` Suren Baghdasaryan
0 siblings, 0 replies; 6+ messages in thread
From: Suren Baghdasaryan @ 2026-04-30 17:24 UTC (permalink / raw)
To: Andrew Morton
Cc: Michal Hocko, Minchan Kim, hca, linux-s390, david, brauner,
linux-mm, linux-kernel, timmurray
On Thu, Apr 30, 2026 at 9:34 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 30 Apr 2026 17:32:40 +0200 Michal Hocko <mhocko@suse.com> wrote:
>
> > On Thu 30-04-26 07:43:05, Andrew Morton wrote:
> > > On Wed, 29 Apr 2026 14:13:59 -0700 Minchan Kim <minchan@kernel.org> wrote:
> > >
> > > > Currently, process_mrelease() requires userspace to send a SIGKILL signal
> > > > prior to invocation. This separation introduces a scheduling race window
> > > > where the victim task may receive the signal and enter the exit path
> > > > before the reaper can invoke process_mrelease().
> > >
> > > Does process_mrelease() have a manpage? My googling was a fail.
> >
> > It does. Very well hidden in 884a7e5964e06
>
>
> Well, that didn't appear to make it into the manpages project and it
> doesn't describe the expected usage: need to kill the process first.
> But I guess all the needed info is in
> tools/testing/selftests/mm/mrelease_test.c.
>
> https://lwn.net/Articles/864184/ is useful.
I'll try to carve out some time to post a proper manpage for it.
Thanks for pointing this out!
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-04-30 17:25 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29 21:13 [PATCH v2] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag Minchan Kim
2026-04-30 9:55 ` Michal Hocko
2026-04-30 14:43 ` Andrew Morton
2026-04-30 15:32 ` Michal Hocko
2026-04-30 16:34 ` Andrew Morton
2026-04-30 17:24 ` Suren Baghdasaryan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox