From: Minchan Kim <minchan@kernel.org>
To: Christian Brauner <brauner@kernel.org>
Cc: akpm@linux-foundation.org, david@kernel.org, mhocko@suse.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
surenb@google.com, timmurray@google.com
Subject: Re: [RFC 3/3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
Date: Thu, 16 Apr 2026 23:30:09 -0700 [thread overview]
Message-ID: <aeHTcXfCyCED4WSl@google.com> (raw)
In-Reply-To: <20260416-planktont-abwinken-b9499483b939@brauner>
On Thu, Apr 16, 2026 at 11:13:35AM +0200, Christian Brauner wrote:
> On Mon, Apr 13, 2026 at 03:39:48PM -0700, Minchan Kim wrote:
> > Currently, process_mrelease() requires userspace to send a SIGKILL signal
> > prior to invocation. This separation introduces a race window where the
> > victim task may receive the signal and enter the exit path before the
> > reaper can invoke process_mrelease().
> >
> > In this case, the victim task frees its memory via the standard, unoptimized
> > exit path, bypassing the expedited clean file folio reclamation optimization
> > introduced in the previous patch (which relies on the MMF_UNSTABLE flag).
> >
> > This patch introduces the PROCESS_MRELEASE_REAP_KILL UAPI flag to support
> > an integrated auto-kill mode. When specified, process_mrelease() directly
> > injects a SIGKILL into the target task.
> >
> > Crucially, this patch utilizes a dedicated signal code (KILL_MRELEASE)
> > during signal injection, belonging to a new SIGKILL si_codes section.
> > This special code ensures that the kernel's signal delivery path reliably
> > intercepts the request and marks the target address space as unstable
> > (MMF_UNSTABLE). This mechanism guarantees that the MMF_UNSTABLE flag is set
> > before either the victim task or the reaper proceeds, ensuring that the
> > expedited reclamation optimization is utilized regardless of scheduling
> > order.
> >
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> > include/uapi/asm-generic/siginfo.h | 6 ++++++
> > include/uapi/linux/mman.h | 4 ++++
> > kernel/signal.c | 4 ++++
> > mm/oom_kill.c | 20 +++++++++++++++++++-
> > 4 files changed, 33 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
> > index 5a1ca43b5fc6..0f59b791dab4 100644
> > --- a/include/uapi/asm-generic/siginfo.h
> > +++ b/include/uapi/asm-generic/siginfo.h
> > @@ -252,6 +252,12 @@ typedef struct siginfo {
> > #define BUS_MCEERR_AO 5
> > #define NSIGBUS 5
> >
> > +/*
> > + * SIGKILL si_codes
> > + */
> > +#define KILL_MRELEASE 1 /* sent by process_mrelease */
> > +#define NSIGKILL 1
> > +
> > /*
> > * SIGTRAP si_codes
> > */
> > diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
> > index e89d00528f2f..4266976b45ad 100644
> > --- a/include/uapi/linux/mman.h
> > +++ b/include/uapi/linux/mman.h
> > @@ -56,4 +56,8 @@ struct cachestat {
> > __u64 nr_recently_evicted;
> > };
> >
> > +/* Flags for process_mrelease */
> > +#define PROCESS_MRELEASE_REAP_KILL (1 << 0)
> > +#define PROCESS_MRELEASE_VALID_FLAGS (PROCESS_MRELEASE_REAP_KILL)
> > +
> > #endif /* _UAPI_LINUX_MMAN_H */
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index d65d0fe24bfb..c21b2176dc5e 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -1134,6 +1134,10 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
> >
> > out_set:
> > signalfd_notify(t, sig);
> > +
> > + if (sig == SIGKILL && !is_si_special(info) &&
> > + info->si_code == KILL_MRELEASE && t->mm)
> > + mm_flags_set(MMF_UNSTABLE, t->mm);
> > sigaddset(&pending->signal, sig);
> >
> > /* Let multiprocess signals appear after on-going forks */
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 5c6c95c169ee..0b5da5208707 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -20,6 +20,8 @@
> >
> > #include <linux/oom.h>
> > #include <linux/mm.h>
> > +#include <uapi/linux/mman.h>
> > +#include <linux/capability.h>
> > #include <linux/err.h>
> > #include <linux/gfp.h>
> > #include <linux/sched.h>
> > @@ -1218,13 +1220,29 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
> > bool reap = false;
> > long ret = 0;
> >
> > - if (flags)
> > + if (flags & ~PROCESS_MRELEASE_VALID_FLAGS)
> > return -EINVAL;
> >
> > task = pidfd_get_task(pidfd, &f_flags);
> > if (IS_ERR(task))
> > return PTR_ERR(task);
> >
> > + if (flags & PROCESS_MRELEASE_REAP_KILL) {
> > + struct kernel_siginfo info;
> > +
> > + if (!capable(CAP_KILL)) {
>
> Why? Just call a function that uses check_kill_permission() before
> firing the signal? What's the rational for doing it this way?
Thanks for pointing that out. I wasn't aware of check_kill_permission().
I took a look at it, and it seems check_kill_permission() handles permissions
primarily for signals sent from userspace. Since we are injecting the signal
from the kernel side using a positive si_code (KILL_MRELEASE),
check_kill_permission() would just return 0 and skip the permission checks
entirely.
I am open to better ideas if there is a more standard way to handle permission
checks for kernel-injected signals.
>
> Tbh, I really hate that process_mrelease() now has a kill side effect
> with non-standard permission handling as well.
>
> Seems like bad api design. Why can't you just raise the MMF_UNSTABLE bit
> before the SIGKILL as that's the problem you're trying to solve.
The problem is that process_mrelease() strictly requires the target process
to already have a pending fatal signal or be in the exit path before it allows
any operation.
Therefore, we cannot invoke process_mrelease() to just set the MMF_UNSTABLE
flag *before* the SIGKILL is sent.
If I send the SIGKILL first to satisfy the process_mrelease() requirement,
we immediately run into the scheduling race condition where the victim can
enter the exit path before the reaper can set the flag.
This circular dependency is exactly why I had to integrate the kill operation
into process_mrelease() to make it atomic.
>
> > + ret = -EPERM;
> > + goto put_task;
> > + }
> > + clear_siginfo(&info);
> > + info.si_signo = SIGKILL;
> > + info.si_code = KILL_MRELEASE;
> > + info.si_pid = task_tgid_vnr(current);
> > + info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
>
> This should not be open-coded like this.
Good point.
Maybe, I can reuse prepare_kill_siginfo.
>
> > +
> > + do_send_sig_info(SIGKILL, &info, task, PIDTYPE_TGID);
> > + }
> > +
> > /*
> > * Make sure to choose a thread which still has a reference to mm
> > * during the group exit
> > --
> > 2.54.0.rc0.605.g598a273b03-goog
> >
next prev parent reply other threads:[~2026-04-17 6:30 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-13 22:39 [RFC 0/3] mm: process_mrelease: expedited reclaim and auto-kill support Minchan Kim
2026-04-13 22:39 ` [RFC 1/3] mm: process_mrelease: expedite clean file folio reclaim via mmu_gather Minchan Kim
2026-04-14 7:45 ` David Hildenbrand (Arm)
2026-04-14 20:21 ` Minchan Kim
2026-04-13 22:39 ` [RFC 2/3] mm: process_mrelease: skip LRU movement for exclusive file folios Minchan Kim
2026-04-14 7:20 ` David Hildenbrand (Arm)
2026-04-14 20:22 ` Minchan Kim
2026-04-13 22:39 ` [RFC 3/3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag Minchan Kim
2026-04-16 9:13 ` Christian Brauner
2026-04-17 6:30 ` Minchan Kim [this message]
2026-04-17 7:04 ` Michal Hocko
2026-04-20 21:47 ` Minchan Kim
2026-04-23 7:17 ` Michal Hocko
2026-04-23 23:43 ` Minchan Kim
2026-04-24 7:38 ` Michal Hocko
2026-04-14 6:57 ` [RFC 0/3] mm: process_mrelease: expedited reclaim and auto-kill support Michal Hocko
2026-04-14 20:00 ` Minchan Kim
2026-04-15 7:38 ` Michal Hocko
2026-04-15 23:26 ` Minchan Kim
2026-04-16 6:54 ` Michal Hocko
2026-04-17 6:20 ` Minchan Kim
2026-04-17 7:11 ` Michal Hocko
2026-04-20 21:53 ` Minchan Kim
2026-04-23 7:50 ` Michal Hocko
2026-04-23 9:49 ` David Hildenbrand (Arm)
2026-04-23 22:36 ` Suren Baghdasaryan
2026-04-24 0:08 ` Minchan Kim
2026-04-24 7:40 ` Michal Hocko
2026-04-24 7:41 ` David Hildenbrand (Arm)
2026-04-27 16:14 ` Suren Baghdasaryan
2026-04-23 23:58 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeHTcXfCyCED4WSl@google.com \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=david@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=surenb@google.com \
--cc=timmurray@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.