All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christian Brauner <brauner@kernel.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [GIT PULL 06/16 for v7.2] kernel task_exec_state
Date: Fri, 12 Jun 2026 17:13:03 +0200	[thread overview]
Message-ID: <20260612-kernel-task_exec_state-v72-c39ca82510c0@brauner> (raw)
In-Reply-To: <20260612-vfs-v72-20facee87e19@brauner>

Hey Linus,

/* Summary */

This introduces a new per-task task_exec_state structure and relocates
the dumpable mode and the user namespace captured at execve() from
mm_struct onto it. It stays attached to the task for its full
lifetime.

__ptrace_may_access() and several /proc owner and visibility checks
need to consult two pieces of state for any observable task, including
zombies that have already gone through exit_mm(): the dumpable mode
and the user namespace captured at execve(). Both live on mm_struct
today, which exit_mm() clears from the task long before the task is
reaped. A reader that races with do_exit() observes task->mm == NULL
and either fails the check or falls back to init_user_ns - which
denies legitimate access to non-dumpable zombies that were running in
a nested user namespace.

mm_struct loses ->user_ns and the dumpability bits in ->flags.
MMF_DUMPABLE_BITS is reserved so the MMF_DUMP_FILTER_* layout exposed
via /proc/<pid>/coredump_filter stays stable. task->user_dumpable and
its exit_mm() snapshot are removed.

task_exec_state is the privilege domain established by an execve().
Within a thread group it is shared via refcount; across thread groups
each task has its own:

- CLONE_VM siblings (thread-group members, io_uring workers)
  refcount-share the parent's exec_state.

- Non-CLONE_VM clones (fork(), vfork() without CLONE_VM) allocate a
  fresh exec_state inheriting the parent's dumpable mode and user_ns.

- execve() in the child allocates a fresh instance and installs it
  under task_lock + exec_update_lock via task_exec_state_replace().

- Credential changes (setresuid, capset, ...) and
  prctl(PR_SET_DUMPABLE) update dumpability on the current task's
  exec_state, i.e., on the thread group's shared instance.

On top of this exec_mmap() no longer tears down the old mm while
holding exec_update_lock for writing and cred_guard_mutex. Neither
lock is needed for that: exec_update_lock only exists to make the mm
swap atomic with the later commit_creds() and all its readers operate
on the new mm; none looks at the detached old mm. The cost was real:
__mmput() runs exit_mmap() over the entire old address space and can
block in exit_aio() waiting for in-flight AIO, so execve() of a large
process blocked ptrace_attach() and every exec_update_lock reader for
the duration of the teardown. The old mm is now stashed in
bprm->old_mm and released from setup_new_exec() after both locks are
dropped, with a backstop in free_bprm() for the error paths.

/* Testing */

gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3+b1)

No build failures or warnings were observed.

/* Conflicts */

Merge conflicts with mainline
=============================

No known conflicts.

Merge conflicts with other trees
================================

The following changes since commit 5200f5f493f79f14bbdc349e402a40dfb32f23c8:

  Linux 7.1-rc4 (2026-05-17 13:59:58 -0700)

are available in the Git repository at:

  git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/kernel-7.2-rc1.task_exec_state

for you to fetch changes up to 38205ecbe6b6dc47968ad4e9c978e2117720969e:

  exec: free the old mm outside the exec locks (2026-05-26 11:02:02 +0200)

----------------------------------------------------------------
kernel-7.2-rc1.task_exec_state

Please consider pulling these changes from the signed kernel-7.2-rc1.task_exec_state tag.

Thanks!
Christian

----------------------------------------------------------------
Christian Brauner (2):
      Merge patch series "exec: introduce task_exec_state for exec-time metadata"
      exec: free the old mm outside the exec locks

Christian Brauner (Amutable) (4):
      sched/coredump: introduce enum task_dumpable
      exec: introduce struct task_exec_state
      ptrace: add ptracer_access_allowed()
      exec_state: relocate dumpable information

 arch/arm64/kernel/mte.c          |   6 +-
 drivers/firmware/efi/efi.c       |   1 -
 fs/coredump.c                    |  22 +++-----
 fs/exec.c                        |  65 +++++++++++++--------
 fs/pidfs.c                       |  23 +++-----
 fs/proc/base.c                   |  39 ++++++-------
 include/linux/binfmts.h          |   3 +
 include/linux/coredump.h         |   4 ++
 include/linux/mm_types.h         |   9 ++-
 include/linux/ptrace.h           |   1 +
 include/linux/sched.h            |   6 +-
 include/linux/sched/coredump.h   |  47 ++++------------
 include/linux/sched/exec_state.h |  31 ++++++++++
 init/init_task.c                 |  10 ++++
 kernel/Makefile                  |   2 +-
 kernel/cred.c                    |   3 +-
 kernel/exec_state.c              | 119 +++++++++++++++++++++++++++++++++++++++
 kernel/exit.c                    |   1 -
 kernel/fork.c                    |  33 +++++++++--
 kernel/kthread.c                 |   1 -
 kernel/ptrace.c                  |  51 +++++++++++------
 kernel/sys.c                     |   6 +-
 mm/init-mm.c                     |   1 -
 23 files changed, 329 insertions(+), 155 deletions(-)
 create mode 100644 include/linux/sched/exec_state.h
 create mode 100644 kernel/exec_state.c

  parent reply	other threads:[~2026-06-12 15:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-12 15:10 [GIT PULL 00/16 for v7.2] v7.2 Christian Brauner
2026-06-12 15:11 ` [GIT PULL 01/16 for v7.2] vfs kfunc Christian Brauner
2026-06-12 15:11 ` [GIT PULL 02/16 for v7.2] vfs exportfs Christian Brauner
2026-06-12 15:12 ` [GIT PULL 03/16 for v7.2] vfs inode Christian Brauner
2026-06-12 15:12 ` [GIT PULL 04/16 for v7.2] vfs directory delegations Christian Brauner
2026-06-12 15:12 ` [GIT PULL 05/16 for v7.2] vfs casefold Christian Brauner
2026-06-12 15:13 ` Christian Brauner [this message]
2026-06-12 15:13 ` [GIT PULL 07/16 for v7.2] kernel misc Christian Brauner
2026-06-12 15:13 ` [GIT PULL 08/16 for v7.2] vfs openat2 Christian Brauner
2026-06-12 15:14 ` [GIT PULL 09/16 for v7.2] vfs super Christian Brauner
2026-06-12 15:14 ` [GIT PULL 10/16 for v7.2] vfs writeback Christian Brauner
2026-06-12 15:14 ` [GIT PULL 11/16 for v7.2] vfs bh Christian Brauner
2026-06-12 15:15 ` [GIT PULL 12/16 for v7.2] vfs eventpoll Christian Brauner
2026-06-12 15:15 ` [GIT PULL 13/16 for v7.2] vfs iomap Christian Brauner
2026-06-12 15:15 ` [GIT PULL 14/16 for v7.2] vfs xattr Christian Brauner
2026-06-12 15:16 ` [GIT PULL 15/16 for v7.2] vfs misc Christian Brauner
2026-06-12 15:16 ` [GIT PULL 16/16 for v7.2] vfs procfs Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260612-kernel-task_exec_state-v72-c39ca82510c0@brauner \
    --to=brauner@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.