All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] pidfd: add CLONE_AUTOREAP and CLONE_PIDFD_AUTOKILL
@ 2026-02-23 10:44 Christian Brauner
  2026-02-23 10:44 ` [PATCH v4 1/4] clone: add CLONE_AUTOREAP Christian Brauner
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Christian Brauner @ 2026-02-23 10:44 UTC (permalink / raw)
  To: Oleg Nesterov, Jann Horn
  Cc: Linus Torvalds, Ingo Molnar, Peter Zijlstra, linux-kernel,
	linux-fsdevel, Christian Brauner

Add two new clone3() flags for pidfd-based process lifecycle management.

CLONE_AUTOREAP makes a child process auto-reap on exit without ever
becoming a zombie. This is a per-process property in contrast to the
existing auto-reap mechanism via SA_NOCLDWAIT or SIG_IGN for SIGCHLD
which applies to all children of a given parent.

Currently the only way to automatically reap children is to set
SA_NOCLDWAIT or SIG_IGN on SIGCHLD. This is a parent-scoped property
affecting all children which makes it unsuitable for libraries or
applications that need selective auto-reaping of specific children while
still being able to wait() on others.

CLONE_AUTOREAP stores an autoreap flag in the child's signal_struct.
When the child exits do_notify_parent() checks this flag causes
exit_notify() to transition the task directly to EXIT_DEAD. Since the
flag lives on the child it survives reparenting: if the original parent
exits and the child is reparented to a subreaper or init the child still
auto-reaps when it eventually exits. This is cleaner then forcing the
subreaper to get SIGCHLD and then reaping it. If the parent doesn't care
the subreaper won't care. If there's a subreaper that would care it
would be easy enough to add a prctl() that either just turns back on
SIGCHLD and turns of auto-reaping or a prctl() that just notifies the
subreaper whenever a child is reparented to it.

CLONE_AUTOREAP can be combined with CLONE_PIDFD to allow the parent to
monitor the child's exit via poll() and retrieve exit status via
PIDFD_GET_INFO. Without CLONE_PIDFD it provides a fire-and-forget
pattern. No exit signal is delivered so exit_signal must be zero.

The flag is not inherited by the autoreap process's own children. Each
child that should be autoreaped must be explicitly created with
CLONE_AUTOREAP.

CLONE_PIDFD_AUTOKILL ties a child's lifetime to the pidfd returned from
clone3(). When the last reference to the struct file created by clone3()
is closed the kernel sends SIGKILL to the child. A pidfd obtained via
pidfd_open() for the same process does not keep the child alive and does
not trigger autokill - only the specific struct file from clone3() has
this property. This is useful for container runtimes, service managers,
and sandboxed subprocess execution - any scenario where the child must
die if the parent crashes or abandons the pidfd or just wants a
throwaway helper process.

CLONE_PIDFD_AUTOKILL requires both CLONE_PIDFD and CLONE_AUTOREAP. It
requires CLONE_PIDFD because the whole point is tying the child's
lifetime to the pidfd. It requires CLONE_AUTOREAP because a killed child
with no one to reap it would become a zombie - the primary use case is
the parent crashing or abandoning the pidfd so no one is around to call
waitpid().

CLONE_PIDFD_AUTOKILL automatically sets no_new_privs on the child
process. This ensures the child cannot escalate privileges beyond the
parent's credential level via setuid/setgid exec. Because the child can
never can more privileges than the parent the autokill SIGKILL is always
within the parent's authority. This avoids the pdeath_signal trap where
the kernel resets the property during secureexec and commit_creds()
making it useless for container runtimes and service managers that
deprivilege themselves. The no_new_privs restriction only affects the
child. The parent retains full privileges.

The clone3 pidfd is identified by the PIDFD_AUTOKILL file flag set on
the struct file at clone3() time. The pidfs .release handler checks this
flag and sends SIGKILL only when it is set. dup()/fork() share the same
struct file so they extend the child's lifetime until the last reference
drops.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Changes in v4:
- Set no_new_privs on child when CLONE_PIDFD_AUTOKILL is used. This
  prevents the child from escalating privileges via setuid/setgid exec
  and eliminates the need for magical resets during credential changes.
  The parent retains full privileges.
- Replace autokill_pidfd pointer with PIDFD_AUTOKILL file flag checked
  in pidfs_file_release(). This eliminates the need for pointer
  comparison, stale pointer concerns, and WRITE_ONCE/READ_ONCE pairing
  (Oleg, Jann).
- Reject CLONE_AUTOREAP | CLONE_PARENT to prevent a CLONE_AUTOREAP
  child from creating silent zombies via clone(CLONE_PARENT) (Oleg).
- Link to v3: https://patch.msgid.link/20260217-work-pidfs-autoreap-v3-0-33a403c20111@kernel.org

Changes in v2:
- Add CLONE_PIDFD_AUTOKILL flag
- Decouple CLONE_AUTOREAP from CLONE_PIDFD: the autoreap mechanism has
  no dependency on pidfds. This allows fire-and-forget patterns where
  the parent does not need exit status.
- Link to v1: https://patch.msgid.link/20260216-work-pidfs-autoreap-v1-0-e63f663008f2@kernel.org

---
Christian Brauner (4):
      clone: add CLONE_AUTOREAP
      pidfd: add CLONE_PIDFD_AUTOKILL
      selftests/pidfd: add CLONE_AUTOREAP tests
      selftests/pidfd: add CLONE_PIDFD_AUTOKILL tests

 fs/pidfs.c                                         |  38 +-
 include/linux/sched/signal.h                       |   1 +
 include/uapi/linux/pidfd.h                         |   1 +
 include/uapi/linux/sched.h                         |   2 +
 kernel/fork.c                                      |  34 +-
 kernel/ptrace.c                                    |   3 +-
 kernel/signal.c                                    |   4 +
 tools/testing/selftests/pidfd/.gitignore           |   1 +
 tools/testing/selftests/pidfd/Makefile             |   2 +-
 .../testing/selftests/pidfd/pidfd_autoreap_test.c  | 793 +++++++++++++++++++++
 10 files changed, 868 insertions(+), 11 deletions(-)
---
base-commit: 6de23f81a5e08be8fbf5e8d7e9febc72a5b5f27f
change-id: 20260214-work-pidfs-autoreap-3ee677e240a8


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-02-24 16:47 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-23 10:44 [PATCH v4 0/4] pidfd: add CLONE_AUTOREAP and CLONE_PIDFD_AUTOKILL Christian Brauner
2026-02-23 10:44 ` [PATCH v4 1/4] clone: add CLONE_AUTOREAP Christian Brauner
2026-02-23 10:44 ` [PATCH v4 2/4] pidfd: add CLONE_PIDFD_AUTOKILL Christian Brauner
2026-02-23 15:47   ` Oleg Nesterov
2026-02-23 15:51     ` Oleg Nesterov
2026-02-23 17:05       ` pidfd && O_RDWR Oleg Nesterov
2026-02-23 18:14         ` David Laight
2026-02-23 19:21         ` Oleg Nesterov
2026-02-23 21:39           ` Christian Brauner
2026-02-24  9:43             ` David Laight
2026-02-24 10:17             ` Oleg Nesterov
2026-02-24 16:47               ` Christian Brauner
2026-02-23 10:45 ` [PATCH v4 3/4] selftests/pidfd: add CLONE_AUTOREAP tests Christian Brauner
2026-02-23 10:45 ` [PATCH v4 4/4] selftests/pidfd: add CLONE_PIDFD_AUTOKILL tests Christian Brauner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.