public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] pid: Allow creation of pidfds to threads
@ 2022-03-31 19:02 Alois Wohlschlager
  2022-04-01  7:09 ` Christian Brauner
  0 siblings, 1 reply; 6+ messages in thread
From: Alois Wohlschlager @ 2022-03-31 19:02 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Eric W. Biederman, Andrew Morton, Peter Zijlstra, Alexey Gladkov,
	Jens Axboe, David Hildenbrand, Rolf Eike Beer, Ran Xiaokai,
	Matthew Bobrowski, Jan Kara, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3392 bytes --]

The pidfd_open() syscall now allows retrieving pidfds to processes which
are not thread group leaders. Like standard pidfds so far, these may be
used to retrieve file descriptors from the target thread using
pidfd_getfd(), as well as for killing the target thread group using
pidfd_send_signal().
However, unlike pidfds referencing thread group leaders, they do not
support polling for process exit. Attempts to do so signal an error
condition instead of blocking indefinitely.

Since the semantics of pidfd_getfd() and pidfd_send_signal() are not
very useful within a thread group, these thread pidfds can only be
created using pidfd_open(), not via clone().

Signed-off-by: Alois Wohlschlager <alois1@gmx-topmail.de>
---
 kernel/fork.c |  3 +++
 kernel/pid.c  | 15 +++------------
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index f1e89007f228..f98230630a57 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1836,6 +1836,9 @@ static __poll_t pidfd_poll(struct file *file, struct
poll_table_struct *pts)
        struct pid *pid = file->private_data;
        __poll_t poll_flags = 0;

+       if (!pid_has_task(pid, PIDTYPE_TGID))
+               return EPOLLERR;
+
        poll_wait(file, &pid->wait_pidfd, pts);

        /*
diff --git a/kernel/pid.c b/kernel/pid.c
index 2fc0a16ec77b..6be745c7399c 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -548,11 +548,6 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int
*flags)
  * Return the task associated with @pidfd. The function takes a reference on
  * the returned task. The caller is responsible for releasing that reference.
  *
- * Currently, the process identified by @pidfd is always a thread-group
leader.
- * This restriction currently exists for all aspects of pidfds including
pidfd
- * creation (CLONE_PIDFD cannot be used with CLONE_THREAD) and pidfd polling
- * (only supports thread group leaders).
- *
  * Return: On success, the task_struct associated with the pidfd.
  *        On error, a negative errno number will be returned.
  */
@@ -566,7 +561,7 @@ struct task_struct *pidfd_get_task(int pidfd, unsigned int
*flags)
        if (IS_ERR(pid))
                return ERR_CAST(pid);

-       task = get_pid_task(pid, PIDTYPE_TGID);
+       task = get_pid_task(pid, PIDTYPE_PID);
        put_pid(pid);
        if (!task)
                return ERR_PTR(-ESRCH);
@@ -595,7 +590,7 @@ int pidfd_create(struct pid *pid, unsigned int flags)
 {
        int fd;

-       if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
+       if (!pid)
                return -EINVAL;

        if (flags & ~(O_NONBLOCK | O_RDWR | O_CLOEXEC))
@@ -616,11 +611,7 @@ int pidfd_create(struct pid *pid, unsigned int flags)
  * @flags: flags to pass
  *
  * This creates a new pid file descriptor with the O_CLOEXEC flag set for
- * the process identified by @pid. Currently, the process identified by
- * @pid must be a thread-group leader. This restriction currently exists
- * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
- * be used with CLONE_THREAD) and pidfd polling (only supports thread group
- * leaders).
+ * the process identified by @pid.
  *
  * Return: On success, a cloexec pidfd is returned.
  *         On error, a negative errno number will be returned.
--
2.35.1

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] pid: Allow creation of pidfds to threads
  2022-03-31 19:02 [PATCH] pid: Allow creation of pidfds to threads Alois Wohlschlager
@ 2022-04-01  7:09 ` Christian Brauner
  2022-04-01  9:00   ` Alois Wohlschlager
  0 siblings, 1 reply; 6+ messages in thread
From: Christian Brauner @ 2022-04-01  7:09 UTC (permalink / raw)
  To: Alois Wohlschlager
  Cc: Eric W. Biederman, Andrew Morton, Peter Zijlstra, Alexey Gladkov,
	Jens Axboe, David Hildenbrand, Rolf Eike Beer, Ran Xiaokai,
	Matthew Bobrowski, Jan Kara, linux-kernel

On Thu, Mar 31, 2022 at 09:02:32PM +0200, Alois Wohlschlager wrote:
> The pidfd_open() syscall now allows retrieving pidfds to processes which
> are not thread group leaders. Like standard pidfds so far, these may be
> used to retrieve file descriptors from the target thread using
> pidfd_getfd(), as well as for killing the target thread group using
> pidfd_send_signal().
> However, unlike pidfds referencing thread group leaders, they do not
> support polling for process exit. Attempts to do so signal an error
> condition instead of blocking indefinitely.
> 
> Since the semantics of pidfd_getfd() and pidfd_send_signal() are not
> very useful within a thread group, these thread pidfds can only be
> created using pidfd_open(), not via clone().
> 
> Signed-off-by: Alois Wohlschlager <alois1@gmx-topmail.de>
> ---

Hey Alois,

We originally blocked this because it is not as easy as simply allowing
pidfds to be created for non-thread-group leaders.
For a start, pidfd_poll() currently doens't work if pidfd_task() isn't a
thread-group leader and you'll just hang for CLONE_PIDFD | CLONE_THREAD.
So at least that needs to be adapated as well and there's likely a bunch
of other corner-cases I'm forgetting about.
Do you have a concrete use-case you want this for?

Christian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pid: Allow creation of pidfds to threads
  2022-04-01  7:09 ` Christian Brauner
@ 2022-04-01  9:00   ` Alois Wohlschlager
  2022-04-01  9:42     ` Christian Brauner
  0 siblings, 1 reply; 6+ messages in thread
From: Alois Wohlschlager @ 2022-04-01  9:00 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Eric W. Biederman, Andrew Morton, Peter Zijlstra, Alexey Gladkov,
	Jens Axboe, David Hildenbrand, Rolf Eike Beer, Ran Xiaokai,
	Matthew Bobrowski, Jan Kara, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1346 bytes --]

Hello Christian,

> We originally blocked this because it is not as easy as simply allowing
> pidfds to be created for non-thread-group leaders.
> For a start, pidfd_poll() currently doens't work if pidfd_task() isn't a
> thread-group leader

I did notice the hang there, that's why my patch changes pidfd_poll to return
error on tasks which are not thread-group leaders. IIRC, waiting on specific
threads is not supported by Linux at all, so I don't see a problem with not
supporting it here either.

> and you'll just hang for CLONE_PIDFD | CLONE_THREAD.

No, CLONE_PIDFD | CLONE_THREAD behavior is unchanged, it will still fail with
EINVAL. I actually confirmed this by double-checking right now.

> So at least that needs to be adapated as well and there's likely a bunch
> of other corner-cases I'm forgetting about.

I'd be happy to hear about other corner-cases so I can fix them.

> Do you have a concrete use-case you want this for?

My use-case is basically making pidfd_getfd actually useful for its intended
purpose: there is a seccomp_unotify-based supervisor that wants to obtain a
file descriptor from its guest. This currently does not work if the action to
be forwarded to the supervisor is performed in a secondary thread, since there
is no way to obtain the required pidfd.

> Christian

Alois

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pid: Allow creation of pidfds to threads
  2022-04-01  9:00   ` Alois Wohlschlager
@ 2022-04-01  9:42     ` Christian Brauner
  2022-04-13  4:46       ` Matthew Bobrowski
  0 siblings, 1 reply; 6+ messages in thread
From: Christian Brauner @ 2022-04-01  9:42 UTC (permalink / raw)
  To: Alois Wohlschlager
  Cc: Eric W. Biederman, Andrew Morton, Peter Zijlstra, Alexey Gladkov,
	Jens Axboe, David Hildenbrand, Rolf Eike Beer, Ran Xiaokai,
	Matthew Bobrowski, Jan Kara, linux-kernel

On Fri, Apr 01, 2022 at 11:00:27AM +0200, Alois Wohlschlager wrote:
> Hello Christian,
> 
> > We originally blocked this because it is not as easy as simply allowing
> > pidfds to be created for non-thread-group leaders.
> > For a start, pidfd_poll() currently doens't work if pidfd_task() isn't a
> > thread-group leader
> 
> I did notice the hang there, that's why my patch changes pidfd_poll to return
> error on tasks which are not thread-group leaders. IIRC, waiting on specific
> threads is not supported by Linux at all, so I don't see a problem with not
> supporting it here either.

In general, it would be quite neat if we could get notified about thread
exit through poll though. That'd be pretty useful. But maybe it's indeed
ok to just not support this (for now at least).

I know that systemd is using pidfds in their event loop so I'd need to
see whether they'd want support for this behavior.

> 
> > and you'll just hang for CLONE_PIDFD | CLONE_THREAD.
> 
> No, CLONE_PIDFD | CLONE_THREAD behavior is unchanged, it will still fail with
> EINVAL. I actually confirmed this by double-checking right now.

I just used the two flags as a shorthand for pidfds referring to
threads. That might've been misleading here.

> 
> > So at least that needs to be adapated as well and there's likely a bunch
> > of other corner-cases I'm forgetting about.
> 
> I'd be happy to hear about other corner-cases so I can fix them.

I need to play with this patch a little and see what current
expectations we do have in the code.

There are various consumers of pidfds and they all have been added with
the assumption that a pidfd refers to a thread-group leader. We should
go through them and see whether changing them to operate on threads is
sane before we can just switch the generic helper.

Bot process_madvise() and process_mrelease() should be fine to operate
on threads afaict from the discussion when they were added.

For pidfd_send_signal() we likely want to at least consider adding the
ability to send a thread-specific signal, i.e. supporting tgkill()
behavior. As it stands it currently only supports kill()-like behavior
where the signal that gets sent is thread-group directed.

I roughly had originally envisioned this to be supportable through the
addition of a new flag to pidfd_send_signal() so callers would be able
to select whether to send a thread-specific signal or not. What do
people think of that?

> 
> > Do you have a concrete use-case you want this for?
> 
> My use-case is basically making pidfd_getfd actually useful for its intended
> purpose: there is a seccomp_unotify-based supervisor that wants to obtain a
> file descriptor from its guest. This currently does not work if the action to
> be forwarded to the supervisor is performed in a secondary thread, since there
> is no way to obtain the required pidfd.

Yeah, I'm well aware of that. I've been working around this limitation
in our implementation for the seccomp notifier for quite a long time
when intercepting the bpf()-syscall.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pid: Allow creation of pidfds to threads
  2022-04-01  9:42     ` Christian Brauner
@ 2022-04-13  4:46       ` Matthew Bobrowski
  2022-04-14  8:11         ` Alois Wohlschlager
  0 siblings, 1 reply; 6+ messages in thread
From: Matthew Bobrowski @ 2022-04-13  4:46 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Alois Wohlschlager, Eric W. Biederman, Andrew Morton,
	Peter Zijlstra, Alexey Gladkov, Jens Axboe, David Hildenbrand,
	Rolf Eike Beer, Ran Xiaokai, Jan Kara, linux-kernel

On Fri, Apr 01, 2022 at 11:42:25AM +0200, Christian Brauner wrote:
> On Fri, Apr 01, 2022 at 11:00:27AM +0200, Alois Wohlschlager wrote:
> > Hello Christian,
> > 
> > > We originally blocked this because it is not as easy as simply allowing
> > > pidfds to be created for non-thread-group leaders.
> > > For a start, pidfd_poll() currently doens't work if pidfd_task() isn't a
> > > thread-group leader
> > 
> > I did notice the hang there, that's why my patch changes pidfd_poll to return
> > error on tasks which are not thread-group leaders. IIRC, waiting on specific
> > threads is not supported by Linux at all, so I don't see a problem with not
> > supporting it here either.
> 
> In general, it would be quite neat if we could get notified about thread
> exit through poll though. That'd be pretty useful. But maybe it's indeed
> ok to just not support this (for now at least).
> 
> I know that systemd is using pidfds in their event loop so I'd need to
> see whether they'd want support for this behavior.
> 
> > 
> > > and you'll just hang for CLONE_PIDFD | CLONE_THREAD.
> > 
> > No, CLONE_PIDFD | CLONE_THREAD behavior is unchanged, it will still fail with
> > EINVAL. I actually confirmed this by double-checking right now.
> 
> I just used the two flags as a shorthand for pidfds referring to
> threads. That might've been misleading here.
> 
> > 
> > > So at least that needs to be adapated as well and there's likely a bunch
> > > of other corner-cases I'm forgetting about.
> > 
> > I'd be happy to hear about other corner-cases so I can fix them.
> 
> I need to play with this patch a little and see what current
> expectations we do have in the code.
> 
> There are various consumers of pidfds and they all have been added with
> the assumption that a pidfd refers to a thread-group leader. We should
> go through them and see whether changing them to operate on threads is
> sane before we can just switch the generic helper.
> 
> Bot process_madvise() and process_mrelease() should be fine to operate
> on threads afaict from the discussion when they were added.
> 
> For pidfd_send_signal() we likely want to at least consider adding the
> ability to send a thread-specific signal, i.e. supporting tgkill()
> behavior. As it stands it currently only supports kill()-like behavior
> where the signal that gets sent is thread-group directed.
> 
> I roughly had originally envisioned this to be supportable through the
> addition of a new flag to pidfd_send_signal() so callers would be able
> to select whether to send a thread-specific signal or not. What do
> people think of that?

Sorry, I've been on parental leave for the last couple of months and
I'm now playing catch-up.

For the fanotify API i.e. FAN_REPORT_PIDFD, I don't see there being
any issues with supporting/returning pidfds which belong to
non-thread-group leaders. In saying that, for this to be useful from
the fanotify API POV, I definitely do think we should consider
supporting the ability to send thread-specific signals via
pidfd_send_signal(). Adding this extension through the optional flag
parameter makes sense to me.

> > > Do you have a concrete use-case you want this for?
> > 
> > My use-case is basically making pidfd_getfd actually useful for its intended
> > purpose: there is a seccomp_unotify-based supervisor that wants to obtain a
> > file descriptor from its guest. This currently does not work if the action to
> > be forwarded to the supervisor is performed in a secondary thread, since there
> > is no way to obtain the required pidfd.
> 
> Yeah, I'm well aware of that. I've been working around this limitation
> in our implementation for the seccomp notifier for quite a long time
> when intercepting the bpf()-syscall.

/M

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pid: Allow creation of pidfds to threads
  2022-04-13  4:46       ` Matthew Bobrowski
@ 2022-04-14  8:11         ` Alois Wohlschlager
  0 siblings, 0 replies; 6+ messages in thread
From: Alois Wohlschlager @ 2022-04-14  8:11 UTC (permalink / raw)
  To: Christian Brauner, Matthew Bobrowski
  Cc: Eric W. Biederman, Andrew Morton, Peter Zijlstra, Alexey Gladkov,
	Jens Axboe, David Hildenbrand, Rolf Eike Beer, Ran Xiaokai,
	Jan Kara, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 677 bytes --]

Am Mittwoch, 13. April 2022, 06:46:20 CEST schrieb Matthew Bobrowski:
> For the fanotify API i.e. FAN_REPORT_PIDFD, I don't see there being
> any issues with supporting/returning pidfds which belong to
> non-thread-group leaders. In saying that, for this to be useful from
> the fanotify API POV, I definitely do think we should consider
> supporting the ability to send thread-specific signals via
> pidfd_send_signal(). Adding this extension through the optional flag
> parameter makes sense to me.

I actually started implementing this, but then took it out again because I
couldn't think of a use case. Now that one has been found, I can bring it back
in a v2.

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-14  8:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-31 19:02 [PATCH] pid: Allow creation of pidfds to threads Alois Wohlschlager
2022-04-01  7:09 ` Christian Brauner
2022-04-01  9:00   ` Alois Wohlschlager
2022-04-01  9:42     ` Christian Brauner
2022-04-13  4:46       ` Matthew Bobrowski
2022-04-14  8:11         ` Alois Wohlschlager

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox