Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v2 0/5] ptrace: keep mm metadata accessible past exit_mm()
@ 2026-05-20 14:42 Christian Brauner
  2026-05-20 14:42 ` [PATCH RFC v2 1/5] sched/coredump: introduce enum task_dumpable Christian Brauner
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Christian Brauner @ 2026-05-20 14:42 UTC (permalink / raw)
  To: Jann Horn, Linus Torvalds, Oleg Nesterov
  Cc: David Hildenbrand (Arm), Andrew Morton, Qualys Security Advisory,
	Kees Cook, Minchan Kim, linux-mm, Suren Baghdasaryan,
	Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
	Michal Hocko, Christian Brauner (Amutable)

This series relocates the dumpable mode and the user_namespace
captured at execve() from mm_struct onto a new per-task
task_exec_state structure that stays attached to the task for its
full lifetime.

__ptrace_may_access() and several /proc owner / visibility checks
need to consult two pieces of state for any observable task,
including zombies that have already gone through exit_mm(): the
dumpable mode and the user namespace captured at execve(). Both
live on mm_struct today, which exit_mm() clears from the task long
before the task is reaped.

A reader that races with do_exit() observes task->mm == NULL and
either fails the check or falls back to init_user_ns - which denies
legitimate access to non-dumpable zombies that were running in a
nested user namespace.

task_exec_state is RCU-protected, refcounted, freed via call_rcu()
from free_task(). init_task uses a static instance with refcount 2
so it is never freed.

mm_struct loses ->user_ns and the dumpability bits in ->flags.
MMF_DUMPABLE_BITS is reserved so MMF_DUMP_FILTER_* layout exposed via
/proc/<pid>/coredump_filter stays stable.  task->user_dumpable and its
exit_mm() snapshot are removed.

task_exec_state is the privilege domain established by an execve(), not
a property of the address space. Following the model Linus sketched in
[1]:

  - Every clone() variant - thread, process, vfork(), io_uring
    worker - refcount-shares the parent's exec_state.  No
    dup-on-fork.
  - Only execve() in the child allocates a fresh instance.
  - Credential changes (setresuid, capset, ...) and
    prctl(PR_SET_DUMPABLE) update dumpability on the shared
    exec_state.

The entire fork subtree of one execve shares one exec_state; a
child enters a new privilege domain only by execve()ing into one.

Behavioral changes:

(1) Dumpability lowering on credential changes now propagates
    across the fork subtree.

    Pre-series, set_dumpable() on commit_creds() targeted
    mm->flags, which was per-mm: shared by CLONE_VM threads but
    private to fork()-without-CLONE_VM children. Under the new
    model the write targets the shared task_exec_state, so a
    privilege drop in any task in the subtree lowers dumpability
    for the entire subtree, including non-CLONE_VM siblings.

    Same-uid ptrace shedding and /proc visibility for the
    "root-launched daemon drops to a service uid" pattern (sshd,
    polkitd, dbus-daemon, NetworkManager, ...) is preserved.

(3) Kernel threads that briefly use a user mm via
    kthread_use_mm() no longer inherit dumpability from the
    borrowed mm. Kthreads are not ptraceable (PF_KTHREAD
    short-circuits __ptrace_may_access), so this is observable
    only via /proc surfaces that a sufficiently privileged reader
    can reach.

[1] https://lore.kernel.org/r/CAHk-=wj+NgoDH3GSicJ140SV8OoDd71pLmL3fgFEsTcgoMC6Og@mail.gmail.com

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
Changes in v2:
- Drop dup-on-fork for non-CLONE_VM clones: every clone() variant
  refcount-shares the parent's task_exec_state; only execve()
  allocates a fresh one.  See "Behavioral changes" in the cover
  letter for the implications.
- Switch commit_creds() to update dumpability on the new
  task_exec_state (instead of dropping the set_dumpable() call
  entirely as in v1).  Drops the explicit smp_wmb()/smp_rmb() pair
  - RCU acquire/release on the cred pointer provides the ordering.
- Link to v1: https://patch.msgid.link/20260516-work-exit_mm-v1-1-76bcc7c2439d@kernel.org

---
Christian Brauner (5):
      sched/coredump: introduce enum task_dumpable
      exec: introduce struct task_exec_state and relocate dumpable
      ptrace: add ptracer_access_allowed()
      exec_state: relocate dumpable information
      cred: switch dumpability lowering to task_exec_state

 arch/arm64/kernel/mte.c          |   6 +--
 drivers/firmware/efi/efi.c       |   1 -
 fs/coredump.c                    |  22 +++-----
 fs/exec.c                        |  39 +++++++-------
 fs/pidfs.c                       |  22 ++++----
 fs/proc/base.c                   |  39 ++++++--------
 include/linux/binfmts.h          |   2 +
 include/linux/coredump.h         |   4 ++
 include/linux/mm_types.h         |   9 ++--
 include/linux/ptrace.h           |   1 +
 include/linux/sched.h            |   7 +--
 include/linux/sched/coredump.h   |  47 ++++-------------
 include/linux/sched/exec_state.h |  31 +++++++++++
 init/init_task.c                 |  10 ++++
 kernel/Makefile                  |   2 +-
 kernel/cred.c                    |  25 +++++----
 kernel/exec_state.c              | 108 +++++++++++++++++++++++++++++++++++++++
 kernel/exit.c                    |   1 -
 kernel/fork.c                    |  15 +++---
 kernel/kthread.c                 |   1 -
 kernel/ptrace.c                  |  62 ++++++++++++----------
 kernel/sys.c                     |   6 +--
 mm/init-mm.c                     |   1 -
 23 files changed, 289 insertions(+), 172 deletions(-)
---
base-commit: ab5fce87a778cb780a05984a2ca448f2b41aafbf
change-id: 20260520-work-task_exec_state-83209d8b3e53



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2026-05-20 19:47 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-20 14:42 [PATCH RFC v2 0/5] ptrace: keep mm metadata accessible past exit_mm() Christian Brauner
2026-05-20 14:42 ` [PATCH RFC v2 1/5] sched/coredump: introduce enum task_dumpable Christian Brauner
2026-05-20 16:27   ` Jann Horn
2026-05-20 14:42 ` [PATCH RFC v2 2/5] exec: introduce struct task_exec_state and relocate dumpable Christian Brauner
2026-05-20 15:14   ` Linus Torvalds
2026-05-20 15:24     ` Christian Brauner
2026-05-20 16:27   ` Jann Horn
2026-05-20 19:47     ` Christian Brauner
2026-05-20 14:42 ` [PATCH RFC v2 3/5] ptrace: add ptracer_access_allowed() Christian Brauner
2026-05-20 16:28   ` Jann Horn
2026-05-20 14:42 ` [PATCH RFC v2 4/5] exec_state: relocate dumpable information Christian Brauner
2026-05-20 19:21   ` Jann Horn
2026-05-20 19:47     ` Christian Brauner
2026-05-20 14:42 ` [PATCH RFC v2 5/5] cred: switch dumpability lowering to task_exec_state Christian Brauner
2026-05-20 18:44   ` Jann Horn
2026-05-20 15:08 ` [PATCH RFC v2 0/5] ptrace: keep mm metadata accessible past exit_mm() Christian Brauner
2026-05-20 16:27 ` Jann Horn
2026-05-20 16:52   ` Linus Torvalds
2026-05-20 16:55     ` Linus Torvalds
2026-05-20 18:09       ` Jann Horn
2026-05-20 18:12         ` Linus Torvalds
2026-05-20 19:46           ` Christian Brauner
2026-05-20 17:29     ` Jann Horn
2026-05-20 18:11       ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox