From: Tycho Andersen <tycho@tycho.pizza>
To: Christian Brauner <brauner@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>,
"Eric W . Biederman" <ebiederm@xmission.com>,
linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
Tycho Andersen <tandersen@netflix.com>, Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org,
Joel Fernandes <joel@joelfernandes.org>
Subject: Re: [RFC 1/3] pidfd: allow pidfd_open() on non-thread-group leaders
Date: Thu, 7 Dec 2023 10:52:38 -0700 [thread overview]
Message-ID: <ZXIGZq18Bb6LK1qt@tycho.pizza> (raw)
In-Reply-To: <20231207-netzhaut-wachen-81c34f8ee154@brauner>
On Thu, Dec 07, 2023 at 06:21:18PM +0100, Christian Brauner wrote:
> [Cc fsdevel & Jan because we had some discussions about fanotify
> returning non-thread-group pidfds. That's just for awareness or in case
> this might need special handling.]
>
> On Thu, Nov 30, 2023 at 09:39:44AM -0700, Tycho Andersen wrote:
> > From: Tycho Andersen <tandersen@netflix.com>
> >
> > We are using the pidfd family of syscalls with the seccomp userspace
> > notifier. When some thread triggers a seccomp notification, we want to do
> > some things to its context (munge fd tables via pidfd_getfd(), maybe write
> > to its memory, etc.). However, threads created with ~CLONE_FILES or
> > ~CLONE_VM mean that we can't use the pidfd family of syscalls for this
> > purpose, since their fd table or mm are distinct from the thread group
> > leader's. In this patch, we relax this restriction for pidfd_open().
> >
> > In order to avoid dangling poll() users we need to notify pidfd waiters
> > when individual threads die, but once we do that all the other machinery
> > seems to work ok viz. the tests. But I suppose there are more cases than
> > just this one.
> >
> > Another weirdness is the open-coding of this vs. exporting using
> > do_notify_pidfd(). This particular location is after __exit_signal() is
> > called, which does __unhash_process() which kills ->thread_pid, so we need
> > to use the copy we have locally, vs do_notify_pid() which accesses it via
> > task_pid(). Maybe this suggests that the notification should live somewhere
> > in __exit_signals()? I just put it here because I saw we were already
> > testing if this task was the leader.
> >
> > Signed-off-by: Tycho Andersen <tandersen@netflix.com>
> > ---
>
> So we've always said that if there's a use-case for this then we're
> willing to support it. And I think that stance hasn't changed. I know
> that others have expressed interest in this as well.
>
> So currently the series only enables pidfds for threads to be created
> and allows notifications for threads. But all places that currently make
> use of pidfds refuse non-thread-group leaders. We can certainly proceed
> with a patch series that only enables creation and exit notification but
> we should also consider unlocking additional functionality:
>
> * audit of all callers that use pidfd_get_task()
>
> (1) process_madvise()
> (2) process_mrlease()
>
> I expect that both can handle threads just fine but we'd need an Ack
> from mm people.
>
> * pidfd_prepare() is used to create pidfds for:
>
> (1) CLONE_PIDFD via clone() and clone3()
> (2) SCM_PIDFD and SO_PEERPIDFD
> (3) fanotify
>
> (1) is what this series here is about.
>
> For (2) we need to check whether fanotify would be ok to handle pidfds
> for threads. It might be fine but Jan will probably know more.
>
> For (3) the change doesn't matter because SCM_CREDS always use the
> thread-group leader. So even if we allowed the creation of pidfds for
> threads it wouldn't matter.
> * audit all callers of pidfd_pid() whether they could simply be switched
> to handle individual threads:
>
> (1) setns() handles threads just fine so this is safe to allow.
> (2) pidfd_getfd() I would like to keep restricted and essentially
> freeze new features for it.
>
> I'm not happy that we did didn't just implement it as an ioctl to
> the seccomp notifier. And I wouldn't oppose a patch that would add
> that functionality to the seccomp notifier itself. But that's a
> separate topic.
> (3) pidfd_send_signal(). I think that one is the most interesting on
> to allow signaling individual threads. I'm not sure that you need
> to do this right now in this patch but we need to think about what
> we want to do there.
This all sounds reasonable to me, I can take a look as time permits.
pidfd_send_signal() at the very least would have been useful while
writing these tests.
Tycho
next prev parent reply other threads:[~2023-12-07 17:52 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20231130163946.277502-1-tycho@tycho.pizza>
2023-12-07 17:21 ` [RFC 1/3] pidfd: allow pidfd_open() on non-thread-group leaders Christian Brauner
2023-12-07 17:52 ` Tycho Andersen [this message]
2023-12-08 17:47 ` Jan Kara
[not found] ` <20231130173938.GA21808@redhat.com>
[not found] ` <ZWjM6trZ6uw6yBza@tycho.pizza>
[not found] ` <ZWoKbHJ0152tiGeD@tycho.pizza>
2023-12-07 17:57 ` Christian Brauner
2023-12-07 21:25 ` Christian Brauner
2023-12-08 20:04 ` Tycho Andersen
[not found] ` <874jh3t7e9.fsf@oldenburg.str.redhat.com>
[not found] ` <ZWjaSAhG9KI2i9NK@tycho.pizza>
[not found] ` <a07b7ae6-8e86-4a87-9347-e6e1a0f2ee65@efficios.com>
[not found] ` <87ttp3rprd.fsf@oldenburg.str.redhat.com>
2023-12-07 22:58 ` Christian Brauner
2023-12-08 3:16 ` Jens Axboe
2023-12-08 13:15 ` Florian Weimer
2023-12-08 13:48 ` Christian Brauner
2023-12-08 13:58 ` Florian Weimer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXIGZq18Bb6LK1qt@tycho.pizza \
--to=tycho@tycho.pizza \
--cc=brauner@kernel.org \
--cc=ebiederm@xmission.com \
--cc=jack@suse.cz \
--cc=joel@joelfernandes.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=tandersen@netflix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).