From: Christian Brauner <brauner@kernel.org>
To: Jann Horn <jannh@google.com>,
Linus Torvalds <torvalds@linuxfoundation.org>,
Oleg Nesterov <oleg@redhat.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Qualys Security Advisory <qsa@qualys.com>,
Kees Cook <kees@kernel.org>, Minchan Kim <minchan@kernel.org>,
linux-mm@kvack.org, Suren Baghdasaryan <surenb@google.com>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <liam@infradead.org>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>, Michal Hocko <mhocko@suse.com>,
"Christian Brauner (Amutable)" <brauner@kernel.org>
Subject: [PATCH RFC v2 0/5] ptrace: keep mm metadata accessible past exit_mm()
Date: Wed, 20 May 2026 16:42:53 +0200 [thread overview]
Message-ID: <20260520-work-task_exec_state-v2-0-9ea88ceb09e6@kernel.org> (raw)
This series relocates the dumpable mode and the user_namespace
captured at execve() from mm_struct onto a new per-task
task_exec_state structure that stays attached to the task for its
full lifetime.
__ptrace_may_access() and several /proc owner / visibility checks
need to consult two pieces of state for any observable task,
including zombies that have already gone through exit_mm(): the
dumpable mode and the user namespace captured at execve(). Both
live on mm_struct today, which exit_mm() clears from the task long
before the task is reaped.
A reader that races with do_exit() observes task->mm == NULL and
either fails the check or falls back to init_user_ns - which denies
legitimate access to non-dumpable zombies that were running in a
nested user namespace.
task_exec_state is RCU-protected, refcounted, freed via call_rcu()
from free_task(). init_task uses a static instance with refcount 2
so it is never freed.
mm_struct loses ->user_ns and the dumpability bits in ->flags.
MMF_DUMPABLE_BITS is reserved so MMF_DUMP_FILTER_* layout exposed via
/proc/<pid>/coredump_filter stays stable. task->user_dumpable and its
exit_mm() snapshot are removed.
task_exec_state is the privilege domain established by an execve(), not
a property of the address space. Following the model Linus sketched in
[1]:
- Every clone() variant - thread, process, vfork(), io_uring
worker - refcount-shares the parent's exec_state. No
dup-on-fork.
- Only execve() in the child allocates a fresh instance.
- Credential changes (setresuid, capset, ...) and
prctl(PR_SET_DUMPABLE) update dumpability on the shared
exec_state.
The entire fork subtree of one execve shares one exec_state; a
child enters a new privilege domain only by execve()ing into one.
Behavioral changes:
(1) Dumpability lowering on credential changes now propagates
across the fork subtree.
Pre-series, set_dumpable() on commit_creds() targeted
mm->flags, which was per-mm: shared by CLONE_VM threads but
private to fork()-without-CLONE_VM children. Under the new
model the write targets the shared task_exec_state, so a
privilege drop in any task in the subtree lowers dumpability
for the entire subtree, including non-CLONE_VM siblings.
Same-uid ptrace shedding and /proc visibility for the
"root-launched daemon drops to a service uid" pattern (sshd,
polkitd, dbus-daemon, NetworkManager, ...) is preserved.
(3) Kernel threads that briefly use a user mm via
kthread_use_mm() no longer inherit dumpability from the
borrowed mm. Kthreads are not ptraceable (PF_KTHREAD
short-circuits __ptrace_may_access), so this is observable
only via /proc surfaces that a sufficiently privileged reader
can reach.
[1] https://lore.kernel.org/r/CAHk-=wj+NgoDH3GSicJ140SV8OoDd71pLmL3fgFEsTcgoMC6Og@mail.gmail.com
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
Changes in v2:
- Drop dup-on-fork for non-CLONE_VM clones: every clone() variant
refcount-shares the parent's task_exec_state; only execve()
allocates a fresh one. See "Behavioral changes" in the cover
letter for the implications.
- Switch commit_creds() to update dumpability on the new
task_exec_state (instead of dropping the set_dumpable() call
entirely as in v1). Drops the explicit smp_wmb()/smp_rmb() pair
- RCU acquire/release on the cred pointer provides the ordering.
- Link to v1: https://patch.msgid.link/20260516-work-exit_mm-v1-1-76bcc7c2439d@kernel.org
---
Christian Brauner (5):
sched/coredump: introduce enum task_dumpable
exec: introduce struct task_exec_state and relocate dumpable
ptrace: add ptracer_access_allowed()
exec_state: relocate dumpable information
cred: switch dumpability lowering to task_exec_state
arch/arm64/kernel/mte.c | 6 +--
drivers/firmware/efi/efi.c | 1 -
fs/coredump.c | 22 +++-----
fs/exec.c | 39 +++++++-------
fs/pidfs.c | 22 ++++----
fs/proc/base.c | 39 ++++++--------
include/linux/binfmts.h | 2 +
include/linux/coredump.h | 4 ++
include/linux/mm_types.h | 9 ++--
include/linux/ptrace.h | 1 +
include/linux/sched.h | 7 +--
include/linux/sched/coredump.h | 47 ++++-------------
include/linux/sched/exec_state.h | 31 +++++++++++
init/init_task.c | 10 ++++
kernel/Makefile | 2 +-
kernel/cred.c | 25 +++++----
kernel/exec_state.c | 108 +++++++++++++++++++++++++++++++++++++++
kernel/exit.c | 1 -
kernel/fork.c | 15 +++---
kernel/kthread.c | 1 -
kernel/ptrace.c | 62 ++++++++++++----------
kernel/sys.c | 6 +--
mm/init-mm.c | 1 -
23 files changed, 289 insertions(+), 172 deletions(-)
---
base-commit: ab5fce87a778cb780a05984a2ca448f2b41aafbf
change-id: 20260520-work-task_exec_state-83209d8b3e53
next reply other threads:[~2026-05-20 14:43 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 14:42 Christian Brauner [this message]
2026-05-20 14:42 ` [PATCH RFC v2 1/5] sched/coredump: introduce enum task_dumpable Christian Brauner
2026-05-20 16:27 ` Jann Horn
2026-05-20 14:42 ` [PATCH RFC v2 2/5] exec: introduce struct task_exec_state and relocate dumpable Christian Brauner
2026-05-20 15:14 ` Linus Torvalds
2026-05-20 15:24 ` Christian Brauner
2026-05-20 16:27 ` Jann Horn
2026-05-20 19:47 ` Christian Brauner
2026-05-20 14:42 ` [PATCH RFC v2 3/5] ptrace: add ptracer_access_allowed() Christian Brauner
2026-05-20 16:28 ` Jann Horn
2026-05-20 14:42 ` [PATCH RFC v2 4/5] exec_state: relocate dumpable information Christian Brauner
2026-05-20 19:21 ` Jann Horn
2026-05-20 19:47 ` Christian Brauner
2026-05-20 14:42 ` [PATCH RFC v2 5/5] cred: switch dumpability lowering to task_exec_state Christian Brauner
2026-05-20 18:44 ` Jann Horn
2026-05-20 15:08 ` [PATCH RFC v2 0/5] ptrace: keep mm metadata accessible past exit_mm() Christian Brauner
2026-05-20 16:27 ` Jann Horn
2026-05-20 16:52 ` Linus Torvalds
2026-05-20 16:55 ` Linus Torvalds
2026-05-20 18:09 ` Jann Horn
2026-05-20 18:12 ` Linus Torvalds
2026-05-20 19:46 ` Christian Brauner
2026-05-20 17:29 ` Jann Horn
2026-05-20 18:11 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260520-work-task_exec_state-v2-0-9ea88ceb09e6@kernel.org \
--to=brauner@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=jannh@google.com \
--cc=kees@kernel.org \
--cc=liam@infradead.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=oleg@redhat.com \
--cc=qsa@qualys.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=torvalds@linuxfoundation.org \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox