* PIDFD_THREAD behavior for thread-group leaders
@ 2025-03-06 11:41 Christian Brauner
2025-03-06 12:17 ` Oleg Nesterov
0 siblings, 1 reply; 3+ messages in thread
From: Christian Brauner @ 2025-03-06 11:41 UTC (permalink / raw)
To: Oleg Nesterov
Cc: linux-fsdevel, Jeff Layton, Lennart Poettering, Daan De Meyer,
Mike Yuan
Oleg,
I've been thinking about the multi-threaded exec case where a
non-thread-group leader task execs and assumes the thread-group leaders
struct pid.
Back when we implemented support for PIDFD_THREAD we ended up with the
decision that if userspace holds:
pidfd_leader_thread = pidfd_open(<thread-group-leader-pid>, PIDFD_THREAD)
that exit notification is not strictly defined if a non-thread-group
leader thread execs: If poll is called before the exec happened, then an
exit notification may be observed and if aftewards no exit notification
is generated for the old thread-group leader. Of if exit for the old
thread-group leader was observed but poll is called again then it would
block again.
I was wondering why the following snippet wouldn't work to ensure that
PIDFD_THREAD pidfds for thread-group leaders wouldn't be woken with
spurious exits:
diff --git a/kernel/exit.c b/kernel/exit.c
index 9916305e34d3..b79ded1b3bf5 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -745,8 +745,11 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
/*
* sub-thread or delay_group_leader(), wake up the
* PIDFD_THREAD waiters.
+ *
+ * The thread-group leader will be taken over by the execing
+ * task so don't cause spurious wakeups.
*/
- if (!thread_group_empty(tsk))
+ if (!thread_group_empty(tsk) && (tsk->signal->notify_count >= 0))
do_notify_pidfd(tsk);
if (unlikely(tsk->ptrace)) {
Because that would seem more consistent to me. The downside would be
that if userspace performed a series of multi-threaded exec for
non-thread-group leader threads then waiters wouldn't get woken. But I
think that's probably ok.
To handle this case we could later think about whether we can instead
start generating a separate poll (POLLPRI?) event when exec happens.
I'm probably missing something very obvious why that won't work.
Christian
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: PIDFD_THREAD behavior for thread-group leaders
2025-03-06 11:41 PIDFD_THREAD behavior for thread-group leaders Christian Brauner
@ 2025-03-06 12:17 ` Oleg Nesterov
2025-03-06 12:48 ` Christian Brauner
0 siblings, 1 reply; 3+ messages in thread
From: Oleg Nesterov @ 2025-03-06 12:17 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, Jeff Layton, Lennart Poettering, Daan De Meyer,
Mike Yuan
On 03/06, Christian Brauner wrote:
>
> Back when we implemented support for PIDFD_THREAD we ended up with the
> decision that if userspace holds:
>
> pidfd_leader_thread = pidfd_open(<thread-group-leader-pid>, PIDFD_THREAD)
>
> that exit notification is not strictly defined if a non-thread-group
> leader thread execs:
Yes, this was even documented in commit 64bef697d33b ...
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -745,8 +745,11 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
> /*
> * sub-thread or delay_group_leader(), wake up the
> * PIDFD_THREAD waiters.
> + *
> + * The thread-group leader will be taken over by the execing
> + * task so don't cause spurious wakeups.
> */
> - if (!thread_group_empty(tsk))
> + if (!thread_group_empty(tsk) && (tsk->signal->notify_count >= 0))
> do_notify_pidfd(tsk);
>
> if (unlikely(tsk->ptrace)) {
perhaps... but this won't help if the leader exits and that another
thread does exec?
From the changelog
Perhaps we can improve this behaviour later, pidfd_poll()
can probably take sig->group_exec_task into account. But
this doesn't really differ from the case when the leader
exits before other threads (so pidfd_poll() succeeds) and
then another thread execs and pidfd_poll() will block again.
so I am not sure what can we do.
I'll try to think more later, but I can't promise much :(
Oleg.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: PIDFD_THREAD behavior for thread-group leaders
2025-03-06 12:17 ` Oleg Nesterov
@ 2025-03-06 12:48 ` Christian Brauner
0 siblings, 0 replies; 3+ messages in thread
From: Christian Brauner @ 2025-03-06 12:48 UTC (permalink / raw)
To: Oleg Nesterov
Cc: linux-fsdevel, Jeff Layton, Lennart Poettering, Daan De Meyer,
Mike Yuan
On Thu, Mar 06, 2025 at 01:17:13PM +0100, Oleg Nesterov wrote:
> On 03/06, Christian Brauner wrote:
> >
> > Back when we implemented support for PIDFD_THREAD we ended up with the
> > decision that if userspace holds:
> >
> > pidfd_leader_thread = pidfd_open(<thread-group-leader-pid>, PIDFD_THREAD)
> >
> > that exit notification is not strictly defined if a non-thread-group
> > leader thread execs:
>
> Yes, this was even documented in commit 64bef697d33b ...
Yeah, I'm aware I was just revisiting this decision.
>
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -745,8 +745,11 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
> > /*
> > * sub-thread or delay_group_leader(), wake up the
> > * PIDFD_THREAD waiters.
> > + *
> > + * The thread-group leader will be taken over by the execing
> > + * task so don't cause spurious wakeups.
> > */
> > - if (!thread_group_empty(tsk))
> > + if (!thread_group_empty(tsk) && (tsk->signal->notify_count >= 0))
> > do_notify_pidfd(tsk);
> >
> > if (unlikely(tsk->ptrace)) {
>
> perhaps... but this won't help if the leader exits and that another
> thread does exec?
>
> >From the changelog
>
> Perhaps we can improve this behaviour later, pidfd_poll()
> can probably take sig->group_exec_task into account. But
> this doesn't really differ from the case when the leader
> exits before other threads (so pidfd_poll() succeeds) and
> then another thread execs and pidfd_poll() will block again.
>
> so I am not sure what can we do.
I think early thread-group leader exit is a bug in the userspace
program whereas multi-threaded exec is well-defined (if ugly).
To detect early-thread-group leader exec userspace could do:
pidfd_leader_thread = pidfd_open(<thread-group-leader-pid>, 0)
pidfd_thread = pidfd_open(<thread-group-leader-pid>, PIDFD_THREAD)
and then they need to add both file descriptors to the poll instance.
If proper multi-threaded exec happens no notification will be generated
on either file descriptor.
However, if the thread-group leader exits prematurely then a
notification will be generated on pidfd_thread allowing detection of
malformed behavior.
So the premature thread-group leader exec thing is probably something we
can ignore as a real use-case.
>
> I'll try to think more later, but I can't promise much :(
>
> Oleg.
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-03-06 12:48 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-06 11:41 PIDFD_THREAD behavior for thread-group leaders Christian Brauner
2025-03-06 12:17 ` Oleg Nesterov
2025-03-06 12:48 ` Christian Brauner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.