Re: [PATCH v2] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag

public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed

From: Michal Hocko <mhocko@suse.com>
To: Christian Brauner <brauner@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>,
	akpm@linux-foundation.org, hca@linux.ibm.com,
	linux-s390@vger.kernel.org, david@kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, surenb@google.com,
	timmurray@google.com
Subject: Re: [PATCH v2] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
Date: Tue, 5 May 2026 18:03:03 +0200	[thread overview]
Message-ID: <afoUt3te1k2TNao-@tiehlicka> (raw)
In-Reply-To: <20260505-wegbleiben-deshalb-f929089dbdab@brauner>

On Tue 05-05-26 11:30:22, Christian Brauner wrote:
> IIUC, then the OOM kill if invoked from the kernel just takes down
> without permission checking what it wants to take down. That makes a lot
> of sense and is mostly safe - after all it is the kernel that initiates
> the kill.
> 
> However, when userspace initiates the kill we need at least the
> semantics you proposed, Michal. You can only kill processes that you
> have the necessary privileges over otherwise you end up allowing to
> SIGKILL setuid binaries over which you hold no privileged possibly
> generating information leaks or worse.

Agreed!

> The other thing to keep in mind is that currently pidfds explicitly do
> not to allow to signal taks that are outside of their pid namespace
> hierarchy - see pidfd_send_signal()'s permission checking. I don't want
> to break these semantics - it's just very bad api design if signaling
> suddenly behaves differently and pidfd suddenly convey the ability to
> do a very wide signal scope.

Agreed!

> The other thing is that pidfds are handles that can be sent around using
> SCM_RIGHTS which means they could be forwarded to a container or another
> privileged user that then initiates kill semantics.
> 
> The other thing is that the type of pidfd selects the scope of the
> signaling operation:
> 
> * If the pidfd was created via PIDFD_THREAD then the scope of the signal
>   is by default the individual thread - unless the signal itself is
>   thread-group oriented ofc.
> 
> * If the pidfd was created wihout PIDFD_THREAD then the scope of the
>   signal is by default the thread-group.
> 
> * pidfd_send_signal() provides explicitly scope overrides:
> 
>   (1) PIDFD_SIGNAL_THREAD
>   (2) PIDFD_SIGNAL_THREAD_GROUP
>   (3) PIDFD_SIGNAL_PROCESS_GROUP
> 
>   The flags should be mostly self-explanatory.
> 
>   So I really dislike the idea of now letting the pidfd passed to
>   process_mrelease() to have an implicit scope suddenly. The problem is
>   that this is very opaque to userspace and introduces another way to
>   signal a group of processes.

I do see your point. Unfortunately the whole concept of mm shared
across thread (signal) groups is not fitting well into the overall
model. For the most usecases this is not a big problem. But oom handlers
do care. If you do not kill all owners of the mm you are not releasing
any memory.

> IOW, I still dislike the fact that process_mrelease() is suddenly turned
> into a signal sending syscall and I really dislike the fact that it
> implies a "kill everything with that mm and cross other thread-groups".
> 
> I wonder if you couldn't just add PIDFD_SIGNAL_MM_GROUP or something to
> pidfd_send_signal() instead.

That would be a clean interface for sure. The thing we are struggling
here is not just the killing side of things but also grabbing the mm
before it disappears which is the primary reason why process_mrelease is
turning into signal sending syscall (which you seem to be not in favor
of).

So I can see these options on the table
1) keep process_mrelease as is and live with the race. This sucks
because it makes userspace low memory (oom) killers harder to predict.
2) we add the proposed option to kill&release into process_mrelease that
is not aware of shared mm case. This sucks because it creates an easy
way to evade from the said oom killer
3) same as 2 but add PIDFD_SIGNAL_MM_GROUP that would do the right thing
on the signal handling side. You seem to like the idea from the
pidfd_send_signal POV but I am not sure you are OK with that being
implanted into process_mrelease.
-- 
Michal Hocko
SUSE Labs

next prev parent reply	other threads:[~2026-05-05 16:03 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-29 21:13 [PATCH v2] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag Minchan Kim
2026-04-30  9:55 ` Michal Hocko
2026-05-01 21:17   ` Minchan Kim
2026-05-04  7:51     ` Michal Hocko
2026-05-05  5:04       ` Minchan Kim
2026-05-05  9:30       ` Christian Brauner
2026-05-05 16:03         ` Michal Hocko [this message]
2026-05-05 17:59           ` Minchan Kim
2026-04-30 14:43 ` Andrew Morton
2026-04-30 15:32   ` Michal Hocko
2026-04-30 16:34     ` Andrew Morton
2026-04-30 17:24       ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afoUt3te1k2TNao-@tiehlicka \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=david@kernel.org \
    --cc=hca@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=minchan@kernel.org \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox