Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian Brauner (Amutable)" <brauner@kernel.org>
To: Jann Horn <jannh@google.com>,
	 Linus Torvalds <torvalds@linuxfoundation.org>,
	 Oleg Nesterov <oleg@redhat.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 Qualys Security Advisory <qsa@qualys.com>,
	Kees Cook <kees@kernel.org>,  Minchan Kim <minchan@kernel.org>,
	linux-mm@kvack.org,  Suren Baghdasaryan <surenb@google.com>,
	Lorenzo Stoakes <ljs@kernel.org>,
	 "Liam R. Howlett" <liam@infradead.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	 Mike Rapoport <rppt@kernel.org>, Michal Hocko <mhocko@suse.com>,
	 "Christian Brauner (Amutable)" <brauner@kernel.org>
Subject: [PATCH RFC v3 0/4] exec: introduce task_exec_state for exec-time metadata
Date: Wed, 20 May 2026 23:48:51 +0200	[thread overview]
Message-ID: <20260520-work-task_exec_state-v3-0-69f895bc1385@kernel.org> (raw)

This series relocates the dumpable mode and the user_namespace
captured at execve() from mm_struct onto a new per-task
task_exec_state structure that stays attached to the task for its
full lifetime.

__ptrace_may_access() and several /proc owner / visibility checks
need to consult two pieces of state for any observable task,
including zombies that have already gone through exit_mm(): the
dumpable mode and the user namespace captured at execve(). Both
live on mm_struct today, which exit_mm() clears from the task long
before the task is reaped.

A reader that races with do_exit() observes task->mm == NULL and
either fails the check or falls back to init_user_ns - which denies
legitimate access to non-dumpable zombies that were running in a
nested user namespace.

mm_struct loses ->user_ns and the dumpability bits in ->flags.
MMF_DUMPABLE_BITS is reserved so MMF_DUMP_FILTER_* layout exposed via
/proc/<pid>/coredump_filter stays stable. task->user_dumpable and its
exit_mm() snapshot are removed.

task_exec_state is the privilege domain established by an execve()
[1]. Within a thread group it is shared via refcount; across thread
groups each task has its own:

  - CLONE_VM siblings (thread-group members, io_uring workers)
    refcount-share the parent's exec_state.
  - Non-CLONE_VM clones (fork(), vfork() without CLONE_VM)
    allocate a fresh exec_state inheriting the parent's dumpable
    mode and user_ns.
  - execve() in the child allocates a fresh instance and installs
    it under task_lock + exec_update_lock via
    task_exec_state_replace().
  - Credential changes (setresuid, capset, ...) and
    prctl(PR_SET_DUMPABLE) update dumpability on the current
    task's exec_state, i.e. on the thread group's shared instance.

Behavioral change:

Kernel threads that briefly use a user mm via kthread_use_mm() no
longer inherit dumpability from the borrowed mm. Kthreads are not
ptraceable (PF_KTHREAD short-circuits __ptrace_may_access), so this
is observable only via /proc surfaces that a sufficiently privileged
reader can reach.

[1] https://lore.kernel.org/r/CAHk-=wj+NgoDH3GSicJ140SV8OoDd71pLmL3fgFEsTcgoMC6Og@mail.gmail.com

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
Changes in v3:
- Restore alloc-fresh-and-inherit semantics for non-CLONE_VM clones.
  CLONE_VM siblings still refcount-share; fork() and other
  non-CLONE_VM clones get a fresh exec_state that inherits the
  parent's dumpable mode and user_ns. The v2 "every clone
  refcount-shares" model would have let any forked process in an
  Android zygote64 subtree influence dumpability of its siblings
  via prctl(PR_SET_DUMPABLE).
- Link to v2: https://patch.msgid.link/20260520-work-task_exec_state-v2-0-9ea88ceb09e6@kernel.org

Changes in v2:
- Drop dup-on-fork for non-CLONE_VM clones: every clone() variant
  refcount-shares the parent's task_exec_state; only execve()
  allocates a fresh one.  See "Behavioral changes" in the cover
  letter for the implications.
- Switch commit_creds() to update dumpability on the new
  task_exec_state (instead of dropping the set_dumpable() call
  entirely as in v1).  Drops the explicit smp_wmb()/smp_rmb() pair
  - RCU acquire/release on the cred pointer provides the ordering.
- Link to v1: https://patch.msgid.link/20260516-work-exit_mm-v1-1-76bcc7c2439d@kernel.org

---
Christian Brauner (Amutable) (4):
      sched/coredump: introduce enum task_dumpable
      exec: introduce struct task_exec_state
      ptrace: add ptracer_access_allowed()
      exec_state: relocate dumpable information

 arch/arm64/kernel/mte.c          |   6 +-
 drivers/firmware/efi/efi.c       |   1 -
 fs/coredump.c                    |  22 +++-----
 fs/exec.c                        |  39 ++++++-------
 fs/pidfs.c                       |  23 +++-----
 fs/proc/base.c                   |  39 ++++++-------
 include/linux/binfmts.h          |   2 +
 include/linux/coredump.h         |   4 ++
 include/linux/mm_types.h         |   9 ++-
 include/linux/ptrace.h           |   1 +
 include/linux/sched.h            |   6 +-
 include/linux/sched/coredump.h   |  47 ++++------------
 include/linux/sched/exec_state.h |  29 ++++++++++
 init/init_task.c                 |  10 ++++
 kernel/Makefile                  |   2 +-
 kernel/cred.c                    |   3 +-
 kernel/exec_state.c              | 116 +++++++++++++++++++++++++++++++++++++++
 kernel/exit.c                    |   1 -
 kernel/fork.c                    |  32 +++++++++--
 kernel/kthread.c                 |   1 -
 kernel/ptrace.c                  |  53 ++++++++++++------
 kernel/sys.c                     |   6 +-
 mm/init-mm.c                     |   1 -
 23 files changed, 301 insertions(+), 152 deletions(-)
---
base-commit: ab5fce87a778cb780a05984a2ca448f2b41aafbf
change-id: 20260520-work-task_exec_state-83209d8b3e53



             reply	other threads:[~2026-05-20 21:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-20 21:48 Christian Brauner (Amutable) [this message]
2026-05-20 21:48 ` [PATCH RFC v3 1/4] sched/coredump: introduce enum task_dumpable Christian Brauner (Amutable)
2026-05-20 21:48 ` [PATCH RFC v3 2/4] exec: introduce struct task_exec_state Christian Brauner (Amutable)
2026-05-20 21:48 ` [PATCH RFC v3 3/4] ptrace: add ptracer_access_allowed() Christian Brauner (Amutable)
2026-05-20 21:48 ` [PATCH RFC v3 4/4] exec_state: relocate dumpable information Christian Brauner (Amutable)
2026-05-21 10:05   ` Christian Brauner
2026-05-21 11:16   ` Jann Horn
2026-05-21 13:08     ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260520-work-task_exec_state-v3-0-69f895bc1385@kernel.org \
    --to=brauner@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=jannh@google.com \
    --cc=kees@kernel.org \
    --cc=liam@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=oleg@redhat.com \
    --cc=qsa@qualys.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=torvalds@linuxfoundation.org \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox