From: Christian Brauner <brauner@kernel.org>
To: Jann Horn <jannh@google.com>,
Linus Torvalds <torvalds@linuxfoundation.org>,
Oleg Nesterov <oleg@redhat.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Qualys Security Advisory <qsa@qualys.com>,
Kees Cook <kees@kernel.org>, Minchan Kim <minchan@kernel.org>,
linux-mm@kvack.org, Suren Baghdasaryan <surenb@google.com>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <liam@infradead.org>,
Vlastimil Babka <vbabka@kernel.org>,
Mike Rapoport <rppt@kernel.org>, Michal Hocko <mhocko@suse.com>,
"Christian Brauner (Amutable)" <brauner@kernel.org>
Subject: [PATCH RFC v2 0/5] ptrace: keep mm metadata accessible past exit_mm()
Date: Wed, 20 May 2026 16:42:53 +0200 [thread overview]
Message-ID: <20260520-work-task_exec_state-v2-0-9ea88ceb09e6@kernel.org> (raw)
This series relocates the dumpable mode and the user_namespace
captured at execve() from mm_struct onto a new per-task
task_exec_state structure that stays attached to the task for its
full lifetime.
__ptrace_may_access() and several /proc owner / visibility checks
need to consult two pieces of state for any observable task,
including zombies that have already gone through exit_mm(): the
dumpable mode and the user namespace captured at execve(). Both
live on mm_struct today, which exit_mm() clears from the task long
before the task is reaped.
A reader that races with do_exit() observes task->mm == NULL and
either fails the check or falls back to init_user_ns - which denies
legitimate access to non-dumpable zombies that were running in a
nested user namespace.
task_exec_state is RCU-protected, refcounted, freed via call_rcu()
from free_task(). init_task uses a static instance with refcount 2
so it is never freed.
mm_struct loses ->user_ns and the dumpability bits in ->flags.
MMF_DUMPABLE_BITS is reserved so MMF_DUMP_FILTER_* layout exposed via
/proc/<pid>/coredump_filter stays stable. task->user_dumpable and its
exit_mm() snapshot are removed.
task_exec_state is the privilege domain established by an execve(), not
a property of the address space. Following the model Linus sketched in
[1]:
- Every clone() variant - thread, process, vfork(), io_uring
worker - refcount-shares the parent's exec_state. No
dup-on-fork.
- Only execve() in the child allocates a fresh instance.
- Credential changes (setresuid, capset, ...) and
prctl(PR_SET_DUMPABLE) update dumpability on the shared
exec_state.
The entire fork subtree of one execve shares one exec_state; a
child enters a new privilege domain only by execve()ing into one.
Behavioral changes:
(1) Dumpability lowering on credential changes now propagates
across the fork subtree.
Pre-series, set_dumpable() on commit_creds() targeted
mm->flags, which was per-mm: shared by CLONE_VM threads but
private to fork()-without-CLONE_VM children. Under the new
model the write targets the shared task_exec_state, so a
privilege drop in any task in the subtree lowers dumpability
for the entire subtree, including non-CLONE_VM siblings.
Same-uid ptrace shedding and /proc visibility for the
"root-launched daemon drops to a service uid" pattern (sshd,
polkitd, dbus-daemon, NetworkManager, ...) is preserved.
(3) Kernel threads that briefly use a user mm via
kthread_use_mm() no longer inherit dumpability from the
borrowed mm. Kthreads are not ptraceable (PF_KTHREAD
short-circuits __ptrace_may_access), so this is observable
only via /proc surfaces that a sufficiently privileged reader
can reach.
[1] https://lore.kernel.org/r/CAHk-=wj+NgoDH3GSicJ140SV8OoDd71pLmL3fgFEsTcgoMC6Og@mail.gmail.com
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
Changes in v2:
- Drop dup-on-fork for non-CLONE_VM clones: every clone() variant
refcount-shares the parent's task_exec_state; only execve()
allocates a fresh one. See "Behavioral changes" in the cover
letter for the implications.
- Switch commit_creds() to update dumpability on the new
task_exec_state (instead of dropping the set_dumpable() call
entirely as in v1). Drops the explicit smp_wmb()/smp_rmb() pair
- RCU acquire/release on the cred pointer provides the ordering.
- Link to v1: https://patch.msgid.link/20260516-work-exit_mm-v1-1-76bcc7c2439d@kernel.org
---
Christian Brauner (5):
sched/coredump: introduce enum task_dumpable
exec: introduce struct task_exec_state and relocate dumpable
ptrace: add ptracer_access_allowed()
exec_state: relocate dumpable information
cred: switch dumpability lowering to task_exec_state
arch/arm64/kernel/mte.c | 6 +--
drivers/firmware/efi/efi.c | 1 -
fs/coredump.c | 22 +++-----
fs/exec.c | 39 +++++++-------
fs/pidfs.c | 22 ++++----
fs/proc/base.c | 39 ++++++--------
include/linux/binfmts.h | 2 +
include/linux/coredump.h | 4 ++
include/linux/mm_types.h | 9 ++--
include/linux/ptrace.h | 1 +
include/linux/sched.h | 7 +--
include/linux/sched/coredump.h | 47 ++++-------------
include/linux/sched/exec_state.h | 31 +++++++++++
init/init_task.c | 10 ++++
kernel/Makefile | 2 +-
kernel/cred.c | 25 +++++----
kernel/exec_state.c | 108 +++++++++++++++++++++++++++++++++++++++
kernel/exit.c | 1 -
kernel/fork.c | 15 +++---
kernel/kthread.c | 1 -
kernel/ptrace.c | 62 ++++++++++++----------
kernel/sys.c | 6 +--
mm/init-mm.c | 1 -
23 files changed, 289 insertions(+), 172 deletions(-)
---
base-commit: ab5fce87a778cb780a05984a2ca448f2b41aafbf
change-id: 20260520-work-task_exec_state-83209d8b3e53
next reply other threads:[~2026-05-20 14:43 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-20 14:42 Christian Brauner [this message]
2026-05-20 14:42 ` [PATCH RFC v2 1/5] sched/coredump: introduce enum task_dumpable Christian Brauner
2026-05-20 16:27 ` Jann Horn
2026-05-20 14:42 ` [PATCH RFC v2 2/5] exec: introduce struct task_exec_state and relocate dumpable Christian Brauner
2026-05-20 15:14 ` Linus Torvalds
2026-05-20 15:24 ` Christian Brauner
2026-05-20 16:27 ` Jann Horn
2026-05-20 19:47 ` Christian Brauner
2026-05-20 14:42 ` [PATCH RFC v2 3/5] ptrace: add ptracer_access_allowed() Christian Brauner
2026-05-20 16:28 ` Jann Horn
2026-05-20 14:42 ` [PATCH RFC v2 4/5] exec_state: relocate dumpable information Christian Brauner
2026-05-20 19:21 ` Jann Horn
2026-05-20 19:47 ` Christian Brauner
2026-05-20 14:42 ` [PATCH RFC v2 5/5] cred: switch dumpability lowering to task_exec_state Christian Brauner
2026-05-20 18:44 ` Jann Horn
2026-05-20 15:08 ` [PATCH RFC v2 0/5] ptrace: keep mm metadata accessible past exit_mm() Christian Brauner
2026-05-20 16:27 ` Jann Horn
2026-05-20 16:52 ` Linus Torvalds
2026-05-20 16:55 ` Linus Torvalds
2026-05-20 18:09 ` Jann Horn
2026-05-20 18:12 ` Linus Torvalds
2026-05-20 19:46 ` Christian Brauner
2026-05-20 17:29 ` Jann Horn
2026-05-20 18:11 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260520-work-task_exec_state-v2-0-9ea88ceb09e6@kernel.org \
--to=brauner@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=jannh@google.com \
--cc=kees@kernel.org \
--cc=liam@infradead.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=oleg@redhat.com \
--cc=qsa@qualys.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=torvalds@linuxfoundation.org \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.