All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian Brauner (Amutable)" <brauner@kernel.org>
To: Jann Horn <jannh@google.com>,
	 Linus Torvalds <torvalds@linuxfoundation.org>,
	 Oleg Nesterov <oleg@redhat.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 Qualys Security Advisory <qsa@qualys.com>,
	Kees Cook <kees@kernel.org>,  Minchan Kim <minchan@kernel.org>,
	linux-mm@kvack.org,  Suren Baghdasaryan <surenb@google.com>,
	Lorenzo Stoakes <ljs@kernel.org>,
	 "Liam R. Howlett" <liam@infradead.org>,
	Vlastimil Babka <vbabka@kernel.org>,
	 Mike Rapoport <rppt@kernel.org>, Michal Hocko <mhocko@suse.com>,
	 "Christian Brauner (Amutable)" <brauner@kernel.org>
Subject: [PATCH RFC v3 0/4] exec: introduce task_exec_state for exec-time metadata
Date: Wed, 20 May 2026 23:48:51 +0200	[thread overview]
Message-ID: <20260520-work-task_exec_state-v3-0-69f895bc1385@kernel.org> (raw)

This series relocates the dumpable mode and the user_namespace
captured at execve() from mm_struct onto a new per-task
task_exec_state structure that stays attached to the task for its
full lifetime.

__ptrace_may_access() and several /proc owner / visibility checks
need to consult two pieces of state for any observable task,
including zombies that have already gone through exit_mm(): the
dumpable mode and the user namespace captured at execve(). Both
live on mm_struct today, which exit_mm() clears from the task long
before the task is reaped.

A reader that races with do_exit() observes task->mm == NULL and
either fails the check or falls back to init_user_ns - which denies
legitimate access to non-dumpable zombies that were running in a
nested user namespace.

mm_struct loses ->user_ns and the dumpability bits in ->flags.
MMF_DUMPABLE_BITS is reserved so MMF_DUMP_FILTER_* layout exposed via
/proc/<pid>/coredump_filter stays stable. task->user_dumpable and its
exit_mm() snapshot are removed.

task_exec_state is the privilege domain established by an execve()
[1]. Within a thread group it is shared via refcount; across thread
groups each task has its own:

  - CLONE_VM siblings (thread-group members, io_uring workers)
    refcount-share the parent's exec_state.
  - Non-CLONE_VM clones (fork(), vfork() without CLONE_VM)
    allocate a fresh exec_state inheriting the parent's dumpable
    mode and user_ns.
  - execve() in the child allocates a fresh instance and installs
    it under task_lock + exec_update_lock via
    task_exec_state_replace().
  - Credential changes (setresuid, capset, ...) and
    prctl(PR_SET_DUMPABLE) update dumpability on the current
    task's exec_state, i.e. on the thread group's shared instance.

Behavioral change:

Kernel threads that briefly use a user mm via kthread_use_mm() no
longer inherit dumpability from the borrowed mm. Kthreads are not
ptraceable (PF_KTHREAD short-circuits __ptrace_may_access), so this
is observable only via /proc surfaces that a sufficiently privileged
reader can reach.

[1] https://lore.kernel.org/r/CAHk-=wj+NgoDH3GSicJ140SV8OoDd71pLmL3fgFEsTcgoMC6Og@mail.gmail.com

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
Changes in v3:
- Restore alloc-fresh-and-inherit semantics for non-CLONE_VM clones.
  CLONE_VM siblings still refcount-share; fork() and other
  non-CLONE_VM clones get a fresh exec_state that inherits the
  parent's dumpable mode and user_ns. The v2 "every clone
  refcount-shares" model would have let any forked process in an
  Android zygote64 subtree influence dumpability of its siblings
  via prctl(PR_SET_DUMPABLE).
- Link to v2: https://patch.msgid.link/20260520-work-task_exec_state-v2-0-9ea88ceb09e6@kernel.org

Changes in v2:
- Drop dup-on-fork for non-CLONE_VM clones: every clone() variant
  refcount-shares the parent's task_exec_state; only execve()
  allocates a fresh one.  See "Behavioral changes" in the cover
  letter for the implications.
- Switch commit_creds() to update dumpability on the new
  task_exec_state (instead of dropping the set_dumpable() call
  entirely as in v1).  Drops the explicit smp_wmb()/smp_rmb() pair
  - RCU acquire/release on the cred pointer provides the ordering.
- Link to v1: https://patch.msgid.link/20260516-work-exit_mm-v1-1-76bcc7c2439d@kernel.org

---
Christian Brauner (Amutable) (4):
      sched/coredump: introduce enum task_dumpable
      exec: introduce struct task_exec_state
      ptrace: add ptracer_access_allowed()
      exec_state: relocate dumpable information

 arch/arm64/kernel/mte.c          |   6 +-
 drivers/firmware/efi/efi.c       |   1 -
 fs/coredump.c                    |  22 +++-----
 fs/exec.c                        |  39 ++++++-------
 fs/pidfs.c                       |  23 +++-----
 fs/proc/base.c                   |  39 ++++++-------
 include/linux/binfmts.h          |   2 +
 include/linux/coredump.h         |   4 ++
 include/linux/mm_types.h         |   9 ++-
 include/linux/ptrace.h           |   1 +
 include/linux/sched.h            |   6 +-
 include/linux/sched/coredump.h   |  47 ++++------------
 include/linux/sched/exec_state.h |  29 ++++++++++
 init/init_task.c                 |  10 ++++
 kernel/Makefile                  |   2 +-
 kernel/cred.c                    |   3 +-
 kernel/exec_state.c              | 116 +++++++++++++++++++++++++++++++++++++++
 kernel/exit.c                    |   1 -
 kernel/fork.c                    |  32 +++++++++--
 kernel/kthread.c                 |   1 -
 kernel/ptrace.c                  |  53 ++++++++++++------
 kernel/sys.c                     |   6 +-
 mm/init-mm.c                     |   1 -
 23 files changed, 301 insertions(+), 152 deletions(-)
---
base-commit: ab5fce87a778cb780a05984a2ca448f2b41aafbf
change-id: 20260520-work-task_exec_state-83209d8b3e53



             reply	other threads:[~2026-05-20 21:49 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-20 21:48 Christian Brauner (Amutable) [this message]
2026-05-20 21:48 ` [PATCH RFC v3 1/4] sched/coredump: introduce enum task_dumpable Christian Brauner (Amutable)
2026-05-22 22:14   ` David Hildenbrand (Arm)
2026-05-20 21:48 ` [PATCH RFC v3 2/4] exec: introduce struct task_exec_state Christian Brauner (Amutable)
2026-05-22 15:00   ` Oleg Nesterov
2026-05-26  7:16     ` Christian Brauner
2026-05-26  8:17       ` Oleg Nesterov
2026-05-22 22:21   ` David Hildenbrand (Arm)
2026-05-20 21:48 ` [PATCH RFC v3 3/4] ptrace: add ptracer_access_allowed() Christian Brauner (Amutable)
2026-05-22 15:08   ` Oleg Nesterov
2026-05-22 22:32   ` David Hildenbrand (Arm)
2026-05-20 21:48 ` [PATCH RFC v3 4/4] exec_state: relocate dumpable information Christian Brauner (Amutable)
2026-05-21 10:05   ` Christian Brauner
2026-05-21 11:16   ` Jann Horn
2026-05-21 13:08     ` Christian Brauner
2026-05-26 13:07   ` David Hildenbrand (Arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260520-work-task_exec_state-v3-0-69f895bc1385@kernel.org \
    --to=brauner@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=jannh@google.com \
    --cc=kees@kernel.org \
    --cc=liam@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=oleg@redhat.com \
    --cc=qsa@qualys.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=torvalds@linuxfoundation.org \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.