* + mm-process_mrelease-introduce-process_mrelease_reap_kill-flag.patch added to mm-new branch
@ 2026-04-30 14:43 Andrew Morton
0 siblings, 0 replies; 3+ messages in thread
From: Andrew Morton @ 2026-04-30 14:43 UTC (permalink / raw)
To: mm-commits, timmurray, surenb, shakeel.butt, rientjes, mhocko,
hca, david, brauner, minchan, akpm
The patch titled
Subject: mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
has been added to the -mm mm-new branch. Its filename is
mm-process_mrelease-introduce-process_mrelease_reap_kill-flag.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-process_mrelease-introduce-process_mrelease_reap_kill-flag.patch
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
The mm-new branch of mm.git is not included in linux-next
If a few days of testing in mm-new is successful, the patch will me moved
into mm.git's mm-unstable branch, which is included in linux-next
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Minchan Kim <minchan@kernel.org>
Subject: mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
Date: Wed, 29 Apr 2026 14:13:59 -0700
Currently, process_mrelease() requires userspace to send a SIGKILL signal
prior to invocation. This separation introduces a scheduling race window
where the victim task may receive the signal and enter the exit path
before the reaper can invoke process_mrelease().
When the victim enters the exit path (do_exit -> exit_mm), it clears its
task->mm immediately. This causes process_mrelease() to fail with -ESRCH,
leaving the actual address space teardown (exit_mmap) to be deferred until
the mm's reference count drops to zero. In the field (e.g., Android),
arbitrary reference counts (reading /proc/<pid>/cmdline, or various other
remote VM accesses) frequently delay this teardown indefinitely, defeating
the purpose of expedited reclamation.
In Android's LMKD scenarios, this delay keeps memory pressure high, forcing
the system to unnecessarily kill additional innocent background apps before
the memory from the first victim is recovered.
This patch introduces the PROCESS_MRELEASE_REAP_KILL UAPI flag to support
an integrated auto-kill mode. When specified, process_mrelease() directly
injects a SIGKILL into the target task after finding its mm.
To solve the race condition, we grab the mm reference via mmgrab() before
sending the SIGKILL. If the user passed PROCESS_MRELEASE_REAP_KILL, we assume
it will free its memory and proceed with reaping, making the logic as simple
as reap = reap_kill || task_will_free_mem(p).
To handle shared address spaces safely in the auto-kill mode, we bail out
immediately if the mm is marked with MMF_MULTIPROCESS when
PROCESS_MRELEASE_REAP_KILL is specified. This protects existing users of
process_mrelease() from behavior changes while preventing unsafe reaping of
shared memory.
This policy differs from the global OOM killer, which kills all processes
sharing the same mm to guarantee memory reclamation at all costs (preventing
system hangs). However, process_mrelease() is invoked by userspace policy.
If it fails due to sharing, userspace can simply adapt and select another
victim process (such as another background app in Android case) to release
memory. We do not need to force success or affect processes that were not
targeted.
Fundamentally, this allows process_mrelease() to trigger targeted memory
reclaim (via oom_reaper infrastructure) quickly, even if the victim is
not yet in the exit path, while reusing existing race handling between
reaper and exit_mmap.
Link: https://lore.kernel.org/20260429211359.3829683-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/uapi/linux/mman.h | 4 ++++
mm/oom_kill.c | 27 ++++++++++++++++++++-------
2 files changed, 24 insertions(+), 7 deletions(-)
--- a/include/uapi/linux/mman.h~mm-process_mrelease-introduce-process_mrelease_reap_kill-flag
+++ a/include/uapi/linux/mman.h
@@ -56,4 +56,8 @@ struct cachestat {
__u64 nr_recently_evicted;
};
+/* Flags for process_mrelease */
+#define PROCESS_MRELEASE_REAP_KILL (1 << 0)
+#define PROCESS_MRELEASE_VALID_FLAGS (PROCESS_MRELEASE_REAP_KILL)
+
#endif /* _UAPI_LINUX_MMAN_H */
--- a/mm/oom_kill.c~mm-process_mrelease-introduce-process_mrelease_reap_kill-flag
+++ a/mm/oom_kill.c
@@ -20,6 +20,7 @@
#include <linux/oom.h>
#include <linux/mm.h>
+#include <uapi/linux/mman.h>
#include <linux/err.h>
#include <linux/gfp.h>
#include <linux/sched.h>
@@ -1201,9 +1202,11 @@ SYSCALL_DEFINE2(process_mrelease, int, p
unsigned int f_flags;
bool reap = false;
long ret = 0;
+ bool reap_kill;
- if (flags)
+ if (flags & ~PROCESS_MRELEASE_VALID_FLAGS)
return -EINVAL;
+ reap_kill = !!(flags & PROCESS_MRELEASE_REAP_KILL);
task = pidfd_get_task(pidfd, &f_flags);
if (IS_ERR(task))
@@ -1220,19 +1223,29 @@ SYSCALL_DEFINE2(process_mrelease, int, p
}
mm = p->mm;
- mmgrab(mm);
+ if (reap_kill && mm_flags_test(MMF_MULTIPROCESS, mm)) {
+ ret = -EINVAL;
+ task_unlock(p);
+ goto put_task;
+ }
- if (task_will_free_mem(p))
- reap = true;
- else {
+ reap = reap_kill || task_will_free_mem(p);
+ if (!reap) {
/* Error only if the work has not been done already */
if (!mm_flags_test(MMF_OOM_SKIP, mm))
ret = -EINVAL;
+ task_unlock(p);
+ goto put_task;
}
+
+ mmgrab(mm);
task_unlock(p);
- if (!reap)
- goto drop_mm;
+ if (reap_kill) {
+ ret = kill_pid(task_tgid(task), SIGKILL, 0);
+ if (ret)
+ goto drop_mm;
+ }
if (mmap_read_lock_killable(mm)) {
ret = -EINTR;
_
Patches currently in -mm which might be from minchan@kernel.org are
mm-process_mrelease-introduce-process_mrelease_reap_kill-flag.patch
^ permalink raw reply [flat|nested] 3+ messages in thread* + mm-process_mrelease-introduce-process_mrelease_reap_kill-flag.patch added to mm-new branch
@ 2026-05-12 21:42 Andrew Morton
2026-05-15 21:02 ` Minchan Kim
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2026-05-12 21:42 UTC (permalink / raw)
To: mm-commits, timmurray, surenb, mhocko, hca, david, brauner,
minchan, akpm
The patch titled
Subject: mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
has been added to the -mm mm-new branch. Its filename is
mm-process_mrelease-introduce-process_mrelease_reap_kill-flag.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-process_mrelease-introduce-process_mrelease_reap_kill-flag.patch
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
The mm-new branch of mm.git is not included in linux-next
If a few days of testing in mm-new is successful, the patch will me moved
into mm.git's mm-unstable branch, which is included in linux-next
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Minchan Kim <minchan@kernel.org>
Subject: mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
Date: Mon, 11 May 2026 14:42:26 -0700
Currently, process_mrelease() requires userspace to send a SIGKILL signal
prior to invocation. This separation introduces a scheduling race window
where the victim task may receive the signal and enter the exit path
before the reaper can invoke process_mrelease().
When the victim enters the exit path (do_exit -> exit_mm), it clears its
task->mm immediately. This causes process_mrelease() to fail with -ESRCH,
leaving the actual address space teardown (exit_mmap) to be deferred until
the mm's reference count drops to zero. In the field (e.g., Android),
arbitrary reference counts (reading /proc/<pid>/cmdline, or various other
remote VM accesses) frequently delay this teardown indefinitely, defeating
the purpose of expedited reclamation.
In Android's LMKD scenarios, this delay keeps memory pressure high,
forcing the system to unnecessarily kill additional innocent background
apps before the memory from the first victim is recovered.
This patch introduces the PROCESS_MRELEASE_REAP_KILL UAPI flag to support
an integrated auto-kill mode. When specified, process_mrelease() directly
injects a SIGKILL into the target task after finding its mm.
To solve the race condition, we grab the mm reference via mmgrab() before
sending the SIGKILL. If the user passed PROCESS_MRELEASE_REAP_KILL, we
assume it will free its memory and proceed with reaping, making the logic
as simple as reap = reap_kill || task_will_free_mem(p).
To handle shared address spaces, we deliver SIGKILL to all processes
sharing the same address space using do_pidfd_send_signal_pidns(). This
ensures the target pid resides inside the caller's PID namespace hierarchy
prior to signal delivery. We iterate over all processes sharing the mm
and deliver SIGKILL to each. If delivering the signal to any of the
sharing processes fails, we return an error. Note that this approach may
leave partial side-effects if some processes are killed successfully
before a failure occurs.
Link: https://lore.kernel.org/20260511214226.937793-1-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Tim Murray <timmurray@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/signal.h | 4 ++
include/uapi/linux/mman.h | 4 ++
kernel/signal.c | 29 ++++++++++++++++--
mm/oom_kill.c | 55 +++++++++++++++++++++++++++++++-----
4 files changed, 81 insertions(+), 11 deletions(-)
--- a/include/linux/signal.h~mm-process_mrelease-introduce-process_mrelease_reap_kill-flag
+++ a/include/linux/signal.h
@@ -276,6 +276,8 @@ static inline int valid_signal(unsigned
struct timespec;
struct pt_regs;
+struct mm_struct;
+struct pid;
enum pid_type;
extern int next_signal(struct sigpending *pending, sigset_t *mask);
@@ -283,6 +285,8 @@ extern int do_send_sig_info(int sig, str
struct task_struct *p, enum pid_type type);
extern int group_send_sig_info(int sig, struct kernel_siginfo *info,
struct task_struct *p, enum pid_type type);
+extern int do_pidfd_send_signal_pidns(struct pid *pid, int sig, enum pid_type type,
+ siginfo_t __user *info, unsigned int flags);
extern int send_signal_locked(int sig, struct kernel_siginfo *info,
struct task_struct *p, enum pid_type type);
extern int sigprocmask(int, sigset_t *, sigset_t *);
--- a/include/uapi/linux/mman.h~mm-process_mrelease-introduce-process_mrelease_reap_kill-flag
+++ a/include/uapi/linux/mman.h
@@ -56,4 +56,8 @@ struct cachestat {
__u64 nr_recently_evicted;
};
+/* Flags for process_mrelease */
+#define PROCESS_MRELEASE_REAP_KILL (1 << 0)
+#define PROCESS_MRELEASE_VALID_FLAGS (PROCESS_MRELEASE_REAP_KILL)
+
#endif /* _UAPI_LINUX_MMAN_H */
--- a/kernel/signal.c~mm-process_mrelease-introduce-process_mrelease_reap_kill-flag
+++ a/kernel/signal.c
@@ -4050,6 +4050,30 @@ static int do_pidfd_send_signal(struct p
}
/**
+ * do_pidfd_send_signal_pidns - Send a signal to a process via its struct pid
+ * while validating PID namespace hierarchy.
+ * @pid: the struct pid of the target process
+ * @sig: signal to send
+ * @type: scope of the signal (e.g. PIDTYPE_TGID)
+ * @info: signal info payload
+ * @flags: signaling flags
+ *
+ * Verify that the target pid resides inside the caller's PID namespace
+ * hierarchy prior to signal delivery.
+ *
+ * Return: 0 on success, negative errno on failure.
+ */
+int do_pidfd_send_signal_pidns(struct pid *pid, int sig, enum pid_type type,
+ siginfo_t __user *info, unsigned int flags)
+{
+ /* Enforce PID namespace hierarchy boundary */
+ if (!access_pidfd_pidns(pid))
+ return -EINVAL;
+
+ return do_pidfd_send_signal(pid, sig, type, info, flags);
+}
+
+/**
* sys_pidfd_send_signal - Signal a process through a pidfd
* @pidfd: file descriptor of the process
* @sig: signal to send
@@ -4097,16 +4121,13 @@ SYSCALL_DEFINE4(pidfd_send_signal, int,
if (IS_ERR(pid))
return PTR_ERR(pid);
- if (!access_pidfd_pidns(pid))
- return -EINVAL;
-
/* Infer scope from the type of pidfd. */
if (fd_file(f)->f_flags & PIDFD_THREAD)
type = PIDTYPE_PID;
else
type = PIDTYPE_TGID;
- return do_pidfd_send_signal(pid, sig, type, info, flags);
+ return do_pidfd_send_signal_pidns(pid, sig, type, info, flags);
}
}
--- a/mm/oom_kill.c~mm-process_mrelease-introduce-process_mrelease_reap_kill-flag
+++ a/mm/oom_kill.c
@@ -20,6 +20,7 @@
#include <linux/oom.h>
#include <linux/mm.h>
+#include <uapi/linux/mman.h>
#include <linux/err.h>
#include <linux/gfp.h>
#include <linux/sched.h>
@@ -909,6 +910,39 @@ static bool task_will_free_mem(struct ta
return ret;
}
+/*
+ * kill_all_shared_mm - Deliver SIGKILL to all processes sharing the given address space.
+ * @victim: the targeted OOM process group leader
+ * @mm: the virtual memory space being reaped
+ *
+ * Traverse all threads globally and signal any user processes sharing the identical
+ * mm footprints, ensuring no concurrent users pin the memory. Skips the system
+ * global init and kernel worker threads.
+ */
+static int kill_all_shared_mm(struct task_struct *victim, struct mm_struct *mm)
+{
+ struct task_struct *p;
+ bool failed = false;
+
+ rcu_read_lock();
+ for_each_process(p) {
+ if (!process_shares_mm(p, mm))
+ continue;
+ if (is_global_init(p)) {
+ failed = true;
+ continue;
+ }
+ if (unlikely(p->flags & PF_KTHREAD))
+ continue;
+
+ if (do_pidfd_send_signal_pidns(task_pid(p), SIGKILL, PIDTYPE_TGID, NULL, 0))
+ failed = true;
+ }
+ rcu_read_unlock();
+
+ return failed ? -EBUSY : 0;
+}
+
static void __oom_kill_process(struct task_struct *victim, const char *message)
{
struct task_struct *p;
@@ -1201,9 +1235,11 @@ SYSCALL_DEFINE2(process_mrelease, int, p
unsigned int f_flags;
bool reap = false;
long ret = 0;
+ bool reap_kill;
- if (flags)
+ if (flags & ~PROCESS_MRELEASE_VALID_FLAGS)
return -EINVAL;
+ reap_kill = !!(flags & PROCESS_MRELEASE_REAP_KILL);
task = pidfd_get_task(pidfd, &f_flags);
if (IS_ERR(task))
@@ -1220,19 +1256,24 @@ SYSCALL_DEFINE2(process_mrelease, int, p
}
mm = p->mm;
- mmgrab(mm);
- if (task_will_free_mem(p))
- reap = true;
- else {
+ reap = reap_kill || task_will_free_mem(p);
+ if (!reap) {
/* Error only if the work has not been done already */
if (!mm_flags_test(MMF_OOM_SKIP, mm))
ret = -EINVAL;
+ task_unlock(p);
+ goto put_task;
}
+
+ mmgrab(mm);
task_unlock(p);
- if (!reap)
- goto drop_mm;
+ if (reap_kill) {
+ ret = kill_all_shared_mm(task, mm);
+ if (ret)
+ goto drop_mm;
+ }
if (mmap_read_lock_killable(mm)) {
ret = -EINTR;
_
Patches currently in -mm which might be from minchan@kernel.org are
mm-process_mrelease-introduce-process_mrelease_reap_kill-flag.patch
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: + mm-process_mrelease-introduce-process_mrelease_reap_kill-flag.patch added to mm-new branch
2026-05-12 21:42 Andrew Morton
@ 2026-05-15 21:02 ` Minchan Kim
0 siblings, 0 replies; 3+ messages in thread
From: Minchan Kim @ 2026-05-15 21:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: mm-commits, timmurray, surenb, mhocko, hca, david, brauner
On Tue, May 12, 2026 at 02:42:48PM -0700, Andrew Morton wrote:
>
> The patch titled
> Subject: mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
> has been added to the -mm mm-new branch. Its filename is
> mm-process_mrelease-introduce-process_mrelease_reap_kill-flag.patch
>
> This patch will shortly appear at
> https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-process_mrelease-introduce-process_mrelease_reap_kill-flag.patch
>
> This patch will later appear in the mm-new branch at
> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>
> Note, mm-new is a provisional staging ground for work-in-progress
> patches, and acceptance into mm-new is a notification for others take
> notice and to finish up reviews. Please do not hesitate to respond to
> review feedback and post updated versions to replace or incrementally
> fixup patches in mm-new.
>
> The mm-new branch of mm.git is not included in linux-next
>
> If a few days of testing in mm-new is successful, the patch will me moved
> into mm.git's mm-unstable branch, which is included in linux-next
>
> Before you just go and hit "reply", please:
> a) Consider who else should be cc'ed
> b) Prefer to cc a suitable mailing list as well
> c) Ideally: find the original patch on the mailing list and do a
> reply-to-all to that, adding suitable additional cc's
>
> *** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
>
> The -mm tree is included into linux-next via various
> branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> and is updated there most days
>
> ------------------------------------------------------
> From: Minchan Kim <minchan@kernel.org>
> Subject: mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
> Date: Mon, 11 May 2026 14:42:26 -0700
Andrew, can you drop this patch?
Following our discussion, I’ll rework the process_mrelease issue
by building on Jann's patch instead.
https://lore.kernel.org/linux-mm/ageHmE1QIzK4R3nO@google.com/
Thank you.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-15 21:02 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-30 14:43 + mm-process_mrelease-introduce-process_mrelease_reap_kill-flag.patch added to mm-new branch Andrew Morton
-- strict thread matches above, loose matches on Subject: below --
2026-05-12 21:42 Andrew Morton
2026-05-15 21:02 ` Minchan Kim
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.