All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: Jann Horn <jannh@google.com>,
	 Linus Torvalds <torvalds@linuxfoundation.org>,
	 Oleg Nesterov <oleg@redhat.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 Qualys Security Advisory <qsa@qualys.com>,
	Kees Cook <kees@kernel.org>,  Minchan Kim <minchan@kernel.org>,
	linux-mm@kvack.org,  Suren Baghdasaryan <surenb@google.com>,
	Lorenzo Stoakes <ljs@kernel.org>,
	 "Liam R. Howlett" <liam@infradead.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	 Mike Rapoport <rppt@kernel.org>, Michal Hocko <mhocko@suse.com>,
	 "Christian Brauner (Amutable)" <brauner@kernel.org>
Subject: [PATCH RFC v2 0/5] ptrace: keep mm metadata accessible past exit_mm()
Date: Wed, 20 May 2026 16:42:53 +0200	[thread overview]
Message-ID: <20260520-work-task_exec_state-v2-0-9ea88ceb09e6@kernel.org> (raw)

This series relocates the dumpable mode and the user_namespace
captured at execve() from mm_struct onto a new per-task
task_exec_state structure that stays attached to the task for its
full lifetime.

__ptrace_may_access() and several /proc owner / visibility checks
need to consult two pieces of state for any observable task,
including zombies that have already gone through exit_mm(): the
dumpable mode and the user namespace captured at execve(). Both
live on mm_struct today, which exit_mm() clears from the task long
before the task is reaped.

A reader that races with do_exit() observes task->mm == NULL and
either fails the check or falls back to init_user_ns - which denies
legitimate access to non-dumpable zombies that were running in a
nested user namespace.

task_exec_state is RCU-protected, refcounted, freed via call_rcu()
from free_task(). init_task uses a static instance with refcount 2
so it is never freed.

mm_struct loses ->user_ns and the dumpability bits in ->flags.
MMF_DUMPABLE_BITS is reserved so MMF_DUMP_FILTER_* layout exposed via
/proc/<pid>/coredump_filter stays stable.  task->user_dumpable and its
exit_mm() snapshot are removed.

task_exec_state is the privilege domain established by an execve(), not
a property of the address space. Following the model Linus sketched in
[1]:

  - Every clone() variant - thread, process, vfork(), io_uring
    worker - refcount-shares the parent's exec_state.  No
    dup-on-fork.
  - Only execve() in the child allocates a fresh instance.
  - Credential changes (setresuid, capset, ...) and
    prctl(PR_SET_DUMPABLE) update dumpability on the shared
    exec_state.

The entire fork subtree of one execve shares one exec_state; a
child enters a new privilege domain only by execve()ing into one.

Behavioral changes:

(1) Dumpability lowering on credential changes now propagates
    across the fork subtree.

    Pre-series, set_dumpable() on commit_creds() targeted
    mm->flags, which was per-mm: shared by CLONE_VM threads but
    private to fork()-without-CLONE_VM children. Under the new
    model the write targets the shared task_exec_state, so a
    privilege drop in any task in the subtree lowers dumpability
    for the entire subtree, including non-CLONE_VM siblings.

    Same-uid ptrace shedding and /proc visibility for the
    "root-launched daemon drops to a service uid" pattern (sshd,
    polkitd, dbus-daemon, NetworkManager, ...) is preserved.

(3) Kernel threads that briefly use a user mm via
    kthread_use_mm() no longer inherit dumpability from the
    borrowed mm. Kthreads are not ptraceable (PF_KTHREAD
    short-circuits __ptrace_may_access), so this is observable
    only via /proc surfaces that a sufficiently privileged reader
    can reach.

[1] https://lore.kernel.org/r/CAHk-=wj+NgoDH3GSicJ140SV8OoDd71pLmL3fgFEsTcgoMC6Og@mail.gmail.com

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
Changes in v2:
- Drop dup-on-fork for non-CLONE_VM clones: every clone() variant
  refcount-shares the parent's task_exec_state; only execve()
  allocates a fresh one.  See "Behavioral changes" in the cover
  letter for the implications.
- Switch commit_creds() to update dumpability on the new
  task_exec_state (instead of dropping the set_dumpable() call
  entirely as in v1).  Drops the explicit smp_wmb()/smp_rmb() pair
  - RCU acquire/release on the cred pointer provides the ordering.
- Link to v1: https://patch.msgid.link/20260516-work-exit_mm-v1-1-76bcc7c2439d@kernel.org

---
Christian Brauner (5):
      sched/coredump: introduce enum task_dumpable
      exec: introduce struct task_exec_state and relocate dumpable
      ptrace: add ptracer_access_allowed()
      exec_state: relocate dumpable information
      cred: switch dumpability lowering to task_exec_state

 arch/arm64/kernel/mte.c          |   6 +--
 drivers/firmware/efi/efi.c       |   1 -
 fs/coredump.c                    |  22 +++-----
 fs/exec.c                        |  39 +++++++-------
 fs/pidfs.c                       |  22 ++++----
 fs/proc/base.c                   |  39 ++++++--------
 include/linux/binfmts.h          |   2 +
 include/linux/coredump.h         |   4 ++
 include/linux/mm_types.h         |   9 ++--
 include/linux/ptrace.h           |   1 +
 include/linux/sched.h            |   7 +--
 include/linux/sched/coredump.h   |  47 ++++-------------
 include/linux/sched/exec_state.h |  31 +++++++++++
 init/init_task.c                 |  10 ++++
 kernel/Makefile                  |   2 +-
 kernel/cred.c                    |  25 +++++----
 kernel/exec_state.c              | 108 +++++++++++++++++++++++++++++++++++++++
 kernel/exit.c                    |   1 -
 kernel/fork.c                    |  15 +++---
 kernel/kthread.c                 |   1 -
 kernel/ptrace.c                  |  62 ++++++++++++----------
 kernel/sys.c                     |   6 +--
 mm/init-mm.c                     |   1 -
 23 files changed, 289 insertions(+), 172 deletions(-)
---
base-commit: ab5fce87a778cb780a05984a2ca448f2b41aafbf
change-id: 20260520-work-task_exec_state-83209d8b3e53



             reply	other threads:[~2026-05-20 14:43 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-20 14:42 Christian Brauner [this message]
2026-05-20 14:42 ` [PATCH RFC v2 1/5] sched/coredump: introduce enum task_dumpable Christian Brauner
2026-05-20 16:27   ` Jann Horn
2026-05-20 14:42 ` [PATCH RFC v2 2/5] exec: introduce struct task_exec_state and relocate dumpable Christian Brauner
2026-05-20 15:14   ` Linus Torvalds
2026-05-20 15:24     ` Christian Brauner
2026-05-20 16:27   ` Jann Horn
2026-05-20 19:47     ` Christian Brauner
2026-05-20 14:42 ` [PATCH RFC v2 3/5] ptrace: add ptracer_access_allowed() Christian Brauner
2026-05-20 16:28   ` Jann Horn
2026-05-20 14:42 ` [PATCH RFC v2 4/5] exec_state: relocate dumpable information Christian Brauner
2026-05-20 19:21   ` Jann Horn
2026-05-20 19:47     ` Christian Brauner
2026-05-20 14:42 ` [PATCH RFC v2 5/5] cred: switch dumpability lowering to task_exec_state Christian Brauner
2026-05-20 18:44   ` Jann Horn
2026-05-20 15:08 ` [PATCH RFC v2 0/5] ptrace: keep mm metadata accessible past exit_mm() Christian Brauner
2026-05-20 16:27 ` Jann Horn
2026-05-20 16:52   ` Linus Torvalds
2026-05-20 16:55     ` Linus Torvalds
2026-05-20 18:09       ` Jann Horn
2026-05-20 18:12         ` Linus Torvalds
2026-05-20 19:46           ` Christian Brauner
2026-05-20 17:29     ` Jann Horn
2026-05-20 18:11       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260520-work-task_exec_state-v2-0-9ea88ceb09e6@kernel.org \
    --to=brauner@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=jannh@google.com \
    --cc=kees@kernel.org \
    --cc=liam@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=oleg@redhat.com \
    --cc=qsa@qualys.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=torvalds@linuxfoundation.org \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.