Linux kernel -stable discussions
 help / color / mirror / Atom feed
* [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock
@ 2026-05-18 16:35 Jann Horn
  2026-05-18 16:35 ` [PATCH 1/2] proc: protect ptrace_may_access() with exec_update_lock (part 1) Jann Horn
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Jann Horn @ 2026-05-18 16:35 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner
  Cc: Jan Kara, Arjan van de Ven, Eric W. Biederman, Jake Edge,
	linux-kernel, linux-fsdevel, Jann Horn, stable

My understanding is that procfs is effectively maintained by the VFS
maintainers (though scripts/get_maintainer.pl claims that there are
no maintainers for procfs because the VFS entry only claims files
directly in fs/, and the procfs entry has no maintainers listed on
it).

In procfs, most uses of ptrace_may_access() should use
exec_update_lock to avoid TOCTOU issues with concurrent privileged
execve() (like setuid binary execution).

This series doesn't fix all the remaining issues in procfs, but it fixes
the easy cases for now; I will probably follow up with fixes for the
gnarlier cases later unless someone else wants to do that.

I have checked that procfs files still work with these changes and that
CONFIG_PROVE_LOCKING=y doesn't generate any warnings.

(checkpatch complains about missing argument names in
proc_op::proc_get_link, but that was already the case before my patch.)

Signed-off-by: Jann Horn <jannh@google.com>
---
Jann Horn (2):
      proc: protect ptrace_may_access() with exec_update_lock (part 1)
      proc: protect ptrace_may_access() with exec_update_lock (FD links)

 fs/proc/array.c      |   6 ++
 fs/proc/base.c       | 159 ++++++++++++++++++++++-----------------------------
 fs/proc/fd.c         |  27 ++++-----
 fs/proc/internal.h   |   2 +-
 fs/proc/namespaces.c |  12 ++++
 5 files changed, 97 insertions(+), 109 deletions(-)
---
base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
change-id: 20260518-procfs-lockfix-part1-5dca2d95bc12

--  
Jann Horn <jannh@google.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] proc: protect ptrace_may_access() with exec_update_lock (part 1)
  2026-05-18 16:35 [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock Jann Horn
@ 2026-05-18 16:35 ` Jann Horn
  2026-05-26  8:48   ` Oleg Nesterov
  2026-05-18 16:35 ` [PATCH 2/2] proc: protect ptrace_may_access() with exec_update_lock (FD links) Jann Horn
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Jann Horn @ 2026-05-18 16:35 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner
  Cc: Jan Kara, Arjan van de Ven, Eric W. Biederman, Jake Edge,
	linux-kernel, linux-fsdevel, Jann Horn, stable

Fix the easy cases where procfs currently calls ptrace_may_access() without
exec_update_lock protection, where the fix is to simply add the extra lock
or use mm_access():

 - do_task_stat(): grab exec_update_lock
 - proc_pid_wchan(): grab exec_update_lock
 - proc_map_files_lookup(): use mm_access() instead of get_task_mm()
 - proc_map_files_readdir(): use mm_access() instead of get_task_mm()
 - proc_ns_get_link(): grab exec_update_lock
 - proc_ns_readlink(): grab exec_update_lock

Fixes: f83ce3e6b02d ("proc: avoid information leaks to non-privileged processes")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
---
 fs/proc/array.c      |  6 ++++++
 fs/proc/base.c       | 40 ++++++++++++++++++++--------------------
 fs/proc/namespaces.c | 12 ++++++++++++
 3 files changed, 38 insertions(+), 20 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 90fb0c6b5f99..479ea8cb4ef4 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -482,6 +482,11 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	unsigned long flags;
 	int exit_code = task->exit_code;
 	struct signal_struct *sig = task->signal;
+	int ret;
+
+	ret = down_read_killable(&task->signal->exec_update_lock);
+	if (ret)
+		return ret;
 
 	state = *get_task_state(task);
 	vsize = eip = esp = 0;
@@ -657,6 +662,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 		seq_puts(m, " 0");
 
 	seq_putc(m, '\n');
+	up_read(&task->signal->exec_update_lock);
 	if (mm)
 		mmput(mm);
 	return 0;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d9acfa89c894..09b02d1621e5 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -423,18 +423,24 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
 {
 	unsigned long wchan;
 	char symname[KSYM_NAME_LEN];
+	int err;
 
+	err = down_read_killable(&task->signal->exec_update_lock);
+	if (err)
+		return err;
 	if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
 		goto print0;
 
 	wchan = get_wchan(task);
 	if (wchan && !lookup_symbol_name(wchan, symname)) {
 		seq_puts(m, symname);
+		up_read(&task->signal->exec_update_lock);
 		return 0;
 	}
 
 print0:
 	seq_putc(m, '0');
+	up_read(&task->signal->exec_update_lock);
 	return 0;
 }
 #endif /* CONFIG_KALLSYMS */
@@ -2360,17 +2366,15 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
 	if (!task)
 		goto out;
 
-	result = ERR_PTR(-EACCES);
-	if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
-		goto out_put_task;
-
 	result = ERR_PTR(-ENOENT);
 	if (dname_to_vma_addr(dentry, &vm_start, &vm_end))
 		goto out_put_task;
 
-	mm = get_task_mm(task);
-	if (!mm)
+	mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
+	if (IS_ERR(mm)) {
+		result = ERR_CAST(mm);
 		goto out_put_task;
+	}
 
 	result = ERR_PTR(-EINTR);
 	if (mmap_read_lock_killable(mm))
@@ -2420,23 +2424,19 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
 	if (!task)
 		goto out;
 
-	ret = -EACCES;
-	if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
+	mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
+	if (IS_ERR(mm)) {
+		ret = PTR_ERR(mm);
 		goto out_put_task;
+	}
 
 	ret = 0;
 	if (!dir_emit_dots(file, ctx))
-		goto out_put_task;
-
-	mm = get_task_mm(task);
-	if (!mm)
-		goto out_put_task;
+		goto out_put_mm;
 
 	ret = mmap_read_lock_killable(mm);
-	if (ret) {
-		mmput(mm);
-		goto out_put_task;
-	}
+	if (ret)
+		goto out_put_mm;
 
 	nr_files = 0;
 
@@ -2462,8 +2462,7 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
 		if (!p) {
 			ret = -ENOMEM;
 			mmap_read_unlock(mm);
-			mmput(mm);
-			goto out_put_task;
+			goto out_put_mm;
 		}
 
 		p->start = vma->vm_start;
@@ -2471,7 +2470,6 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
 		p->mode = vma->vm_file->f_mode;
 	}
 	mmap_read_unlock(mm);
-	mmput(mm);
 
 	for (i = 0; i < nr_files; i++) {
 		char buf[4 * sizeof(long) + 2];	/* max: %lx-%lx\0 */
@@ -2488,6 +2486,8 @@ proc_map_files_readdir(struct file *file, struct dir_context *ctx)
 		ctx->pos++;
 	}
 
+out_put_mm:
+	mmput(mm);
 out_put_task:
 	put_task_struct(task);
 out:
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index 39f4169f669f..2f46f1396744 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -55,6 +55,10 @@ static const char *proc_ns_get_link(struct dentry *dentry,
 	if (!task)
 		return ERR_PTR(-EACCES);
 
+	error = down_read_killable(&task->signal->exec_update_lock);
+	if (error)
+		goto out_put_task;
+
 	if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
 		goto out;
 
@@ -64,6 +68,8 @@ static const char *proc_ns_get_link(struct dentry *dentry,
 
 	error = nd_jump_link(&ns_path);
 out:
+	up_read(&task->signal->exec_update_lock);
+out_put_task:
 	put_task_struct(task);
 	return ERR_PTR(error);
 }
@@ -80,11 +86,17 @@ static int proc_ns_readlink(struct dentry *dentry, char __user *buffer, int bufl
 	if (!task)
 		return res;
 
+	res = down_read_killable(&task->signal->exec_update_lock);
+	if (res)
+		goto out_put_task;
+
 	if (ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS)) {
 		res = ns_get_name(name, sizeof(name), task, ns_ops);
 		if (res >= 0)
 			res = readlink_copy(buffer, buflen, name, strlen(name));
 	}
+	up_read(&task->signal->exec_update_lock);
+out_put_task:
 	put_task_struct(task);
 	return res;
 }

-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] proc: protect ptrace_may_access() with exec_update_lock (FD links)
  2026-05-18 16:35 [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock Jann Horn
  2026-05-18 16:35 ` [PATCH 1/2] proc: protect ptrace_may_access() with exec_update_lock (part 1) Jann Horn
@ 2026-05-18 16:35 ` Jann Horn
  2026-05-22 11:47 ` [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock Christian Brauner
  2026-05-25 19:56 ` Eric W. Biederman
  3 siblings, 0 replies; 13+ messages in thread
From: Jann Horn @ 2026-05-18 16:35 UTC (permalink / raw)
  To: Alexander Viro, Christian Brauner
  Cc: Jan Kara, Arjan van de Ven, Eric W. Biederman, Jake Edge,
	linux-kernel, linux-fsdevel, Jann Horn, stable

proc_pid_get_link() and proc_pid_readlink() currently look up the task from
the pid once, then do the ptrace access check on that task, then look up
the task from the pid a second time to do the actual access.
That's racy in several ways.

To fix it, pass the task to the ->proc_get_link() handler, and instead of
proc_fd_access_allowed(), introduce a new helper call_proc_get_link() that
looks up and locks the task, does the access check, and calls
->proc_get_link().

Fixes: 778c1144771f ("[PATCH] proc: Use sane permission checks on the /proc/<pid>/fd/ symlinks")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
---
 fs/proc/base.c     | 119 +++++++++++++++++++++--------------------------------
 fs/proc/fd.c       |  27 +++++-------
 fs/proc/internal.h |   2 +-
 3 files changed, 59 insertions(+), 89 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 09b02d1621e5..ef2f59461374 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -218,33 +218,24 @@ static int get_task_root(struct task_struct *task, struct path *root)
 	return result;
 }
 
-static int proc_cwd_link(struct dentry *dentry, struct path *path)
+static int proc_cwd_link(struct dentry *dentry, struct path *path,
+			 struct task_struct *task)
 {
-	struct task_struct *task = get_proc_task(d_inode(dentry));
 	int result = -ENOENT;
 
-	if (task) {
-		task_lock(task);
-		if (task->fs) {
-			get_fs_pwd(task->fs, path);
-			result = 0;
-		}
-		task_unlock(task);
-		put_task_struct(task);
+	task_lock(task);
+	if (task->fs) {
+		get_fs_pwd(task->fs, path);
+		result = 0;
 	}
+	task_unlock(task);
 	return result;
 }
 
-static int proc_root_link(struct dentry *dentry, struct path *path)
+static int proc_root_link(struct dentry *dentry, struct path *path,
+			  struct task_struct *task)
 {
-	struct task_struct *task = get_proc_task(d_inode(dentry));
-	int result = -ENOENT;
-
-	if (task) {
-		result = get_task_root(task, path);
-		put_task_struct(task);
-	}
-	return result;
+	return get_task_root(task, path);
 }
 
 /*
@@ -710,23 +701,6 @@ static int proc_pid_syscall(struct seq_file *m, struct pid_namespace *ns,
 /*                       Here the fs part begins                        */
 /************************************************************************/
 
-/* permission checks */
-static bool proc_fd_access_allowed(struct inode *inode)
-{
-	struct task_struct *task;
-	bool allowed = false;
-	/* Allow access to a task's file descriptors if it is us or we
-	 * may use ptrace attach to the process and find out that
-	 * information.
-	 */
-	task = get_proc_task(inode);
-	if (task) {
-		allowed = ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
-		put_task_struct(task);
-	}
-	return allowed;
-}
-
 int proc_nochmod_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 		 struct iattr *attr)
 {
@@ -1783,16 +1757,12 @@ static const struct file_operations proc_pid_set_comm_operations = {
 	.release	= single_release,
 };
 
-static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
+static int proc_exe_link(struct dentry *dentry, struct path *exe_path,
+			 struct task_struct *task)
 {
-	struct task_struct *task;
 	struct file *exe_file;
 
-	task = get_proc_task(d_inode(dentry));
-	if (!task)
-		return -ENOENT;
 	exe_file = get_task_exe_file(task);
-	put_task_struct(task);
 	if (exe_file) {
 		*exe_path = exe_file->f_path;
 		path_get(&exe_file->f_path);
@@ -1802,26 +1772,42 @@ static int proc_exe_link(struct dentry *dentry, struct path *exe_path)
 		return -ENOENT;
 }
 
+static int call_proc_get_link(struct dentry *dentry, struct inode *inode, struct path *path_out)
+{
+	struct task_struct *task;
+	int ret;
+
+	task = get_proc_task(inode);
+	if (!task)
+		return -ENOENT;
+	ret = down_read_killable(&task->signal->exec_update_lock);
+	if (ret)
+		goto out_put_task;
+	if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS)) {
+		ret = -EACCES;
+		goto out;
+	}
+	ret = PROC_I(inode)->op.proc_get_link(dentry, path_out, task);
+
+out:
+	up_read(&task->signal->exec_update_lock);
+out_put_task:
+	put_task_struct(task);
+	return ret;
+}
+
 static const char *proc_pid_get_link(struct dentry *dentry,
 				     struct inode *inode,
 				     struct delayed_call *done)
 {
 	struct path path;
-	int error = -EACCES;
+	int error;
 
 	if (!dentry)
 		return ERR_PTR(-ECHILD);
-
-	/* Are we allowed to snoop on the tasks file descriptors? */
-	if (!proc_fd_access_allowed(inode))
-		goto out;
-
-	error = PROC_I(inode)->op.proc_get_link(dentry, &path);
-	if (error)
-		goto out;
-
-	error = nd_jump_link(&path);
-out:
+	error = call_proc_get_link(dentry, inode, &path);
+	if (!error)
+		error = nd_jump_link(&path);
 	return ERR_PTR(error);
 }
 
@@ -1855,17 +1841,11 @@ static int proc_pid_readlink(struct dentry * dentry, char __user * buffer, int b
 	struct inode *inode = d_inode(dentry);
 	struct path path;
 
-	/* Are we allowed to snoop on the tasks file descriptors? */
-	if (!proc_fd_access_allowed(inode))
-		goto out;
-
-	error = PROC_I(inode)->op.proc_get_link(dentry, &path);
-	if (error)
-		goto out;
-
-	error = do_proc_readlink(&path, buffer, buflen);
-	path_put(&path);
-out:
+	error = call_proc_get_link(dentry, inode, &path);
+	if (!error) {
+		error = do_proc_readlink(&path, buffer, buflen);
+		path_put(&path);
+	}
 	return error;
 }
 
@@ -2256,21 +2236,16 @@ static const struct dentry_operations tid_map_files_dentry_operations = {
 	.d_delete	= pid_delete_dentry,
 };
 
-static int map_files_get_link(struct dentry *dentry, struct path *path)
+static int map_files_get_link(struct dentry *dentry, struct path *path,
+			      struct task_struct *task)
 {
 	unsigned long vm_start, vm_end;
 	struct vm_area_struct *vma;
-	struct task_struct *task;
 	struct mm_struct *mm;
 	int rc;
 
 	rc = -ENOENT;
-	task = get_proc_task(d_inode(dentry));
-	if (!task)
-		goto out;
-
 	mm = get_task_mm(task);
-	put_task_struct(task);
 	if (!mm)
 		goto out;
 
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 05c7513e77c7..0f9a1556f2a3 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -171,24 +171,19 @@ static const struct dentry_operations tid_fd_dentry_operations = {
 	.d_delete	= pid_delete_dentry,
 };
 
-static int proc_fd_link(struct dentry *dentry, struct path *path)
+static int proc_fd_link(struct dentry *dentry, struct path *path,
+			struct task_struct *task)
 {
-	struct task_struct *task;
 	int ret = -ENOENT;
-
-	task = get_proc_task(d_inode(dentry));
-	if (task) {
-		unsigned int fd = proc_fd(d_inode(dentry));
-		struct file *fd_file;
-
-		fd_file = fget_task(task, fd);
-		if (fd_file) {
-			*path = fd_file->f_path;
-			path_get(&fd_file->f_path);
-			ret = 0;
-			fput(fd_file);
-		}
-		put_task_struct(task);
+	unsigned int fd = proc_fd(d_inode(dentry));
+	struct file *fd_file;
+
+	fd_file = fget_task(task, fd);
+	if (fd_file) {
+		*path = fd_file->f_path;
+		path_get(&fd_file->f_path);
+		ret = 0;
+		fput(fd_file);
 	}
 
 	return ret;
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 64dc44832808..d31984c3c797 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -107,7 +107,7 @@ extern struct kmem_cache *proc_dir_entry_cache;
 void pde_free(struct proc_dir_entry *pde);
 
 union proc_op {
-	int (*proc_get_link)(struct dentry *, struct path *);
+	int (*proc_get_link)(struct dentry *, struct path *, struct task_struct *);
 	int (*proc_show)(struct seq_file *m,
 		struct pid_namespace *ns, struct pid *pid,
 		struct task_struct *task);

-- 
2.54.0.563.g4f69b47b94-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock
  2026-05-18 16:35 [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock Jann Horn
  2026-05-18 16:35 ` [PATCH 1/2] proc: protect ptrace_may_access() with exec_update_lock (part 1) Jann Horn
  2026-05-18 16:35 ` [PATCH 2/2] proc: protect ptrace_may_access() with exec_update_lock (FD links) Jann Horn
@ 2026-05-22 11:47 ` Christian Brauner
  2026-05-25 19:56 ` Eric W. Biederman
  3 siblings, 0 replies; 13+ messages in thread
From: Christian Brauner @ 2026-05-22 11:47 UTC (permalink / raw)
  To: Jann Horn
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Arjan van de Ven,
	Eric W. Biederman, Jake Edge, linux-kernel, linux-fsdevel, stable

On Mon, 18 May 2026 18:35:14 +0200, Jann Horn <jannh@google.com> wrote:
> [...]
> 
> (checkpatch complains about missing argument names in
> proc_op::proc_get_link, but that was already the case before my patch.)
> 
> Signed-off-by: Jann Horn <jannh@google.com>
> ---

Hm, not super nice as this may cause performance regressions but I think
you're right otherwise. While mostly info leaks - as you mentioned
elsewhere - it would still be nice to try and fix them. So if we can do
it without anyone noticing perf regressions it's probably worth it.

-- 
Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock
  2026-05-18 16:35 [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock Jann Horn
                   ` (2 preceding siblings ...)
  2026-05-22 11:47 ` [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock Christian Brauner
@ 2026-05-25 19:56 ` Eric W. Biederman
  2026-05-26 11:10   ` Oleg Nesterov
  2026-05-26 18:22   ` Jann Horn
  3 siblings, 2 replies; 13+ messages in thread
From: Eric W. Biederman @ 2026-05-25 19:56 UTC (permalink / raw)
  To: Jann Horn
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Arjan van de Ven,
	Jake Edge, linux-kernel, linux-fsdevel, stable, Kees Cook,
	Oleg Nesterov


I have added a couple more people who might be interested.

Kees Cook because as you have structured this it is an exec problem.

Oleg Nesterov as he is knowledgable about ptrace.

Jann Horn <jannh@google.com> writes:

> My understanding is that procfs is effectively maintained by the VFS
> maintainers (though scripts/get_maintainer.pl claims that there are
> no maintainers for procfs because the VFS entry only claims files
> directly in fs/, and the procfs entry has no maintainers listed on
> it).
>
> In procfs, most uses of ptrace_may_access() should use
> exec_update_lock to avoid TOCTOU issues with concurrent privileged
> execve() (like setuid binary execution).
>
> This series doesn't fix all the remaining issues in procfs, but it fixes
> the easy cases for now; I will probably follow up with fixes for the
> gnarlier cases later unless someone else wants to do that.
>
> I have checked that procfs files still work with these changes and that
> CONFIG_PROVE_LOCKING=y doesn't generate any warnings.
>
> (checkpatch complains about missing argument names in
> proc_op::proc_get_link, but that was already the case before my
> patch.)


I think I finally have my context paged back in so I can intelligently
say something about this series.

The scenario you are worried about is when exec gains privileges,
and we read through proc and authenticate with the old credentials
instead of the new credentials.

Question 1.

Assuming the executable is world readable (which they generally are)
is there anything that becomes accessible in that race that was
not already accessible?

Question 2.

How does this race compare to racing with setresuid?
Do we need to fix the setresuid case as well?

Question 3.
Do we care about the case when a privileged process calls a setuid
process and drops privileges?

Question 4.
Is it possible to use a seq_lock instead of reader writer semaphore?
Or is that only for non-sleeping readers?

There have been a number of nasty cases lurking in the background
involving seccomp filters, PTRACE_EVENT_EXIT, de_thread and the like.

Blocking locks, especially ones that get widely used, just scare me in
this area.  Being able to see that something happened between start and
finish and say -EAGAIN or retrying internally seems like it would be
much less prone to weirdness.

The ugly with PTRACE_EVENT_EXIT as I recall is that if ptrace stops one
of the threads (not the one calling exec) at PTRACE_EVENT_EXIT it can
block de_thread, which blocks the rest of exec.  But there is something
in there where the ptracer hangs waiting for the exec to complete.  So
everything just stalls.  The ptracer waiting for exec the exec waiting
for the ptracer.  SIGKILL can get you out of that mess last I looked.
Still it is an ugly mess.

Getting everything away from that mess is why we have exec_update_lock
instead of just cred_guard_mutex.


I would really appreciate hearing the scenarios you are aiming to fix
and how this fixes them.  There are enough races and special cases
I don't feel comfortable reading that we just need exec_update_lock
around ptrace_may_access.  It is not clear to me that is sufficient
to close the small races we are worried about here.

If I could trace through someone else's logic I could be convinced
and the next people to deal with the code could look at it and see
ah.  That is the detail that was missed when it has to be fixed again.

Eric


> Signed-off-by: Jann Horn <jannh@google.com>
> ---
> Jann Horn (2):
>       proc: protect ptrace_may_access() with exec_update_lock (part 1)
>       proc: protect ptrace_may_access() with exec_update_lock (FD links)
>
>  fs/proc/array.c      |   6 ++
>  fs/proc/base.c       | 159 ++++++++++++++++++++++-----------------------------
>  fs/proc/fd.c         |  27 ++++-----
>  fs/proc/internal.h   |   2 +-
>  fs/proc/namespaces.c |  12 ++++
>  5 files changed, 97 insertions(+), 109 deletions(-)
> ---
> base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
> change-id: 20260518-procfs-lockfix-part1-5dca2d95bc12
>
> --  
> Jann Horn <jannh@google.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] proc: protect ptrace_may_access() with exec_update_lock (part 1)
  2026-05-18 16:35 ` [PATCH 1/2] proc: protect ptrace_may_access() with exec_update_lock (part 1) Jann Horn
@ 2026-05-26  8:48   ` Oleg Nesterov
  2026-05-26  9:44     ` Oleg Nesterov
  2026-05-26 14:16     ` Jann Horn
  0 siblings, 2 replies; 13+ messages in thread
From: Oleg Nesterov @ 2026-05-26  8:48 UTC (permalink / raw)
  To: Jann Horn
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Arjan van de Ven,
	Eric W. Biederman, Jake Edge, linux-kernel, linux-fsdevel, stable,
	Kees Cook

On 05/18, Jann Horn wrote:
>
> Fix the easy cases where procfs currently calls ptrace_may_access() without
> exec_update_lock protection, where the fix is to simply add the extra lock
> or use mm_access():

I thought about this too, but I do not know if it is fine performance wise...

And what about proc_coredump_filter_write() which doesn't use ptrace_may_access() ?

AFAICS, we can't rely on the open-time checks. /proc/$pid/coredump_filter can
be opened for writing, the task can do suid exec after that, the file remains
writable.

Not a big deal, but still.

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] proc: protect ptrace_may_access() with exec_update_lock (part 1)
  2026-05-26  8:48   ` Oleg Nesterov
@ 2026-05-26  9:44     ` Oleg Nesterov
  2026-05-26 14:19       ` Jann Horn
  2026-05-26 14:16     ` Jann Horn
  1 sibling, 1 reply; 13+ messages in thread
From: Oleg Nesterov @ 2026-05-26  9:44 UTC (permalink / raw)
  To: Jann Horn
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Arjan van de Ven,
	Eric W. Biederman, Jake Edge, linux-kernel, linux-fsdevel, stable,
	Kees Cook

Perhaps proc_pid_make_inode() can record task->self_exec_id in
proc_inode ? At least this can help to fix the
"if (ptrace_may_access(task)) mm = get_task_mm(task)" pattern...

On 05/26, Oleg Nesterov wrote:
>
> On 05/18, Jann Horn wrote:
> >
> > Fix the easy cases where procfs currently calls ptrace_may_access() without
> > exec_update_lock protection, where the fix is to simply add the extra lock
> > or use mm_access():
>
> I thought about this too, but I do not know if it is fine performance wise...
>
> And what about proc_coredump_filter_write() which doesn't use ptrace_may_access() ?
>
> AFAICS, we can't rely on the open-time checks. /proc/$pid/coredump_filter can
> be opened for writing, the task can do suid exec after that, the file remains
> writable.
>
> Not a big deal, but still.
>
> Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock
  2026-05-25 19:56 ` Eric W. Biederman
@ 2026-05-26 11:10   ` Oleg Nesterov
  2026-05-26 18:22   ` Jann Horn
  1 sibling, 0 replies; 13+ messages in thread
From: Oleg Nesterov @ 2026-05-26 11:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Jann Horn, Alexander Viro, Christian Brauner, Jan Kara,
	Arjan van de Ven, Jake Edge, linux-kernel, linux-fsdevel, stable,
	Kees Cook

On 05/25, Eric W. Biederman wrote:
>
> The ugly with PTRACE_EVENT_EXIT as I recall is that if ptrace stops one
> of the threads (not the one calling exec) at PTRACE_EVENT_EXIT it can
> block de_thread, which blocks the rest of exec.  But there is something
> in there where the ptracer hangs waiting for the exec to complete.  So
> everything just stalls.  The ptracer waiting for exec the exec waiting
> for the ptracer.  SIGKILL can get you out of that mess last I looked.
> Still it is an ugly mess.

Yes... note that even without PTRACE_EVENT_EXIT a traced sub-thread won't
autoreap, so de_thread which waits for --sig->notify_count in __exit_signal()
will block anyway.

Perhaps we can change ptrace_attach() to detect this case somehow and return
-EWOULDBLOCK... Yes this can confuse strace/gdb, but this is better than
the deadlock, even if it is killable.

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] proc: protect ptrace_may_access() with exec_update_lock (part 1)
  2026-05-26  8:48   ` Oleg Nesterov
  2026-05-26  9:44     ` Oleg Nesterov
@ 2026-05-26 14:16     ` Jann Horn
  2026-05-26 18:22       ` Oleg Nesterov
  1 sibling, 1 reply; 13+ messages in thread
From: Jann Horn @ 2026-05-26 14:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Arjan van de Ven,
	Eric W. Biederman, Jake Edge, linux-kernel, linux-fsdevel, stable,
	Kees Cook

On Tue, May 26, 2026 at 10:48 AM Oleg Nesterov <oleg@redhat.com> wrote:
> On 05/18, Jann Horn wrote:
> >
> > Fix the easy cases where procfs currently calls ptrace_may_access() without
> > exec_update_lock protection, where the fix is to simply add the extra lock
> > or use mm_access():
>
> I thought about this too, but I do not know if it is fine performance wise...
>
> And what about proc_coredump_filter_write() which doesn't use ptrace_may_access() ?

Yeah, this series doesn't fix everything, but I figured it would be
better to at least start fixing some of this stuff rather than leaving
this code as-is...

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] proc: protect ptrace_may_access() with exec_update_lock (part 1)
  2026-05-26  9:44     ` Oleg Nesterov
@ 2026-05-26 14:19       ` Jann Horn
  0 siblings, 0 replies; 13+ messages in thread
From: Jann Horn @ 2026-05-26 14:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Arjan van de Ven,
	Eric W. Biederman, Jake Edge, linux-kernel, linux-fsdevel, stable,
	Kees Cook

On Tue, May 26, 2026 at 11:44 AM Oleg Nesterov <oleg@redhat.com> wrote:
> Perhaps proc_pid_make_inode() can record task->self_exec_id in
> proc_inode ? At least this can help to fix the
> "if (ptrace_may_access(task)) mm = get_task_mm(task)" pattern...

Yes, I think something like that might be a good idea for files that
access the process in read/write handlers, though I think recording it
somewhere in file->private_data would be better than putting it in the
proc_inode.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] proc: protect ptrace_may_access() with exec_update_lock (part 1)
  2026-05-26 14:16     ` Jann Horn
@ 2026-05-26 18:22       ` Oleg Nesterov
  2026-05-26 18:30         ` Jann Horn
  0 siblings, 1 reply; 13+ messages in thread
From: Oleg Nesterov @ 2026-05-26 18:22 UTC (permalink / raw)
  To: Jann Horn
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Arjan van de Ven,
	Eric W. Biederman, Jake Edge, linux-kernel, linux-fsdevel, stable,
	Kees Cook

On 05/26, Jann Horn wrote:
>
> On Tue, May 26, 2026 at 10:48 AM Oleg Nesterov <oleg@redhat.com> wrote:
> > On 05/18, Jann Horn wrote:
> > >
> > > Fix the easy cases where procfs currently calls ptrace_may_access() without
> > > exec_update_lock protection, where the fix is to simply add the extra lock
> > > or use mm_access():
> >
> > I thought about this too, but I do not know if it is fine performance wise...
> >
> > And what about proc_coredump_filter_write() which doesn't use ptrace_may_access() ?
>
> Yeah, this series doesn't fix everything,

Aah... Of course, I understand. I wasn't clear. Sorry if it looked as
"you missed proc_coredump_filter_write" from my side.

What I actually tried to ask:

	- Do you think it makes sense to fix proc_coredump_filter_write()
	  as well?

	- If yes. Do you think we should add another down_read(exec_update_lock) +
	  ptrace_may_access() into proc_coredump_filter_write() ? Or perhaps we
	  should discuss other approaches (exec_id/seqcount/etc) from the very
	  beginning?

Oleg.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock
  2026-05-25 19:56 ` Eric W. Biederman
  2026-05-26 11:10   ` Oleg Nesterov
@ 2026-05-26 18:22   ` Jann Horn
  1 sibling, 0 replies; 13+ messages in thread
From: Jann Horn @ 2026-05-26 18:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Arjan van de Ven,
	Jake Edge, linux-kernel, linux-fsdevel, stable, Kees Cook,
	Oleg Nesterov

On Mon, May 25, 2026 at 9:56 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
> I have added a couple more people who might be interested.
>
> Kees Cook because as you have structured this it is an exec problem.
>
> Oleg Nesterov as he is knowledgable about ptrace.
>
> Jann Horn <jannh@google.com> writes:
>
> > My understanding is that procfs is effectively maintained by the VFS
> > maintainers (though scripts/get_maintainer.pl claims that there are
> > no maintainers for procfs because the VFS entry only claims files
> > directly in fs/, and the procfs entry has no maintainers listed on
> > it).
> >
> > In procfs, most uses of ptrace_may_access() should use
> > exec_update_lock to avoid TOCTOU issues with concurrent privileged
> > execve() (like setuid binary execution).
> >
> > This series doesn't fix all the remaining issues in procfs, but it fixes
> > the easy cases for now; I will probably follow up with fixes for the
> > gnarlier cases later unless someone else wants to do that.
> >
> > I have checked that procfs files still work with these changes and that
> > CONFIG_PROVE_LOCKING=y doesn't generate any warnings.
> >
> > (checkpatch complains about missing argument names in
> > proc_op::proc_get_link, but that was already the case before my
> > patch.)
>
>
> I think I finally have my context paged back in so I can intelligently
> say something about this series.
>
> The scenario you are worried about is when exec gains privileges,
> and we read through proc and authenticate with the old credentials
> instead of the new credentials.
>
> Question 1.
>
> Assuming the executable is world readable (which they generally are)
> is there anything that becomes accessible in that race that was
> not already accessible?

I believe so - the gnarliest example I am thinking of is:
Memfds are always mode 0777 or 0666 (see __shmem_file_setup, which
sets S_IRWXUGO), so their access control is purely based on being able
to pathwalk to the memfd's inode. If you can race
open(/proc/$pid/fd/$n) with the process $pid going through setuid
execution and calling memfd_create(), you should be able to get
read+write access to the memfd created by the setuid binary that was
supposed to be private.

(But I have not tested that and don't know if there are actually any
setuid binaries that happen to use memfds.)

> Question 2.
>
> How does this race compare to racing with setresuid?
> Do we need to fix the setresuid case as well?

Which setresuid case? setresuid clears the dumpable flag and has a
memory barrier that is supposed to make that properly ordered against
ptrace_may_access(); so setresuid() should normally not cause a task
to become traceable, though that could maybe happen in weird
scenarios.

I think another case we should probably care about is what happens if
a process which is only protected against ptrace by being non-dumpable
goes through execve() - it shouldn't be possible to access resources
associated with the pre-execve state while checking against the
post-execve dumpability. It might be important for this that the
do_close_on_exec() logic currently happens after committing the
dumpable state in exec_mmap()...

> Question 3.
> Do we care about the case when a privileged process calls a setuid
> process and drops privileges?

I don't understand the question. Hmm - do you mean a case where a
process with ruid=1000, euid=0, suid=1000 does execve() on a setuid
1000 binary? I think we probably don't specifically care about that...

I think another scenario that we ideally might want to care about is
what happens if a process which runs with a normal user's UIDs, but is
non-dumpable, goes through execve() of a normal binary while another
process tries to inspect its FDs or address space layout - it probably
shouldn't be possible to get information about the pre-execve MM and
O_CLOEXEC file descriptors.

> Question 4.
> Is it possible to use a seq_lock instead of reader writer semaphore?
> Or is that only for non-sleeping readers?

Linux seqcounts are 32-bit, which means they are always kind of dodgy,
but they are particularly dodgy if a reader can be forced to sleep for
an extended amount of time. I don't see a reason why we couldn't, in
general, use a 64-bit sequence count for readers that may need to
sleep while reading.

> There have been a number of nasty cases lurking in the background
> involving seccomp filters, PTRACE_EVENT_EXIT, de_thread and the like.
>
> Blocking locks, especially ones that get widely used, just scare me in
> this area.  Being able to see that something happened between start and
> finish and say -EAGAIN or retrying internally seems like it would be
> much less prone to weirdness.

I guess for do_task_stat() we could just switch to down_read_trylock()
instead of down_read_killable(), and proceed with "permitted = 0" if
the trylock fails - almost all the values shown are related to the MM,
and are therefore not stable across execve() anyway.

I think using seqlocks with a retry loop wouldn't work with the code
as-is, because in the middle of execve, there are points where the
file descriptor table still contains entries that we don't want to be
accessible with the task's current dumpability, or where we have
already switched to a new MM without having updated the credentials
yet.
I think we could make it work - we could add another set of creds to
the task, and let ptrace_may_access() check against both the
pre-execve and post-execve credentials and dumpability, but that feels
overengineered.

> The ugly with PTRACE_EVENT_EXIT as I recall is that if ptrace stops one
> of the threads (not the one calling exec) at PTRACE_EVENT_EXIT it can
> block de_thread, which blocks the rest of exec.  But there is something
> in there where the ptracer hangs waiting for the exec to complete.  So
> everything just stalls.  The ptracer waiting for exec the exec waiting
> for the ptracer.  SIGKILL can get you out of that mess last I looked.
> Still it is an ugly mess.
>
> Getting everything away from that mess is why we have exec_update_lock
> instead of just cred_guard_mutex.

And the exec_update_lock avoids that because it is not held in
de_thread(), only across the following part of execve, where not much
stuff happens that could block for a long time, right?

load_elf_binary
  begin_new_exec
    exec_mmap
      down_write_killable(&tsk->signal->exec_update_lock)
      mmput [brauner@ has a patch to move this]
    flush_thread
    do_close_on_exec [notably this can lead to filp->f_op->flush()
calls, which AFAIK can block forever on FUSE/NFS]
    commit_creds
  setup_new_exec
    up_write(&me->signal->exec_update_lock)

I think we might want to do something about the do_close_on_exec()
stuff, like deferring the filp_flush() to a later time, but I don't
really see deadlock potential here.

> I would really appreciate hearing the scenarios you are aiming to fix
> and how this fixes them.  There are enough races and special cases
> I don't feel comfortable reading that we just need exec_update_lock
> around ptrace_may_access.  It is not clear to me that is sufficient
> to close the small races we are worried about here.

The main thing I'm trying to address here are scenarios of the shape
"process A accesses process B through procfs while process B goes
through a privileged execution (in particular by executing a setuid
binary)". /proc/$pid/fd/$fd (part of patch 2) seems particularly
egregious because it can likely be used to gain access to memfds of
setuid binaries; other files are less egregious, but might lead to
things like userspace ASLR/pointer leaks (in particular do_task_stat()
and proc_map_files_readdir()).

A second scenario I have in mind is "process A accesses process B
through procfs while process B goes through a normal execution that
makes it dumpable".

The overarching logic I have at the back of my mind here is: If an
"incarnation" is the combination of a process and an mm_struct, then
holding exec_update_lock ensures that the credentials/dumpability we
have observed are associated with the same incarnation as the MM and
the file descriptor table whose properties we read afterwards.

> If I could trace through someone else's logic I could be convinced
> and the next people to deal with the code could look at it and see
> ah.  That is the detail that was missed when it has to be fixed again.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] proc: protect ptrace_may_access() with exec_update_lock (part 1)
  2026-05-26 18:22       ` Oleg Nesterov
@ 2026-05-26 18:30         ` Jann Horn
  0 siblings, 0 replies; 13+ messages in thread
From: Jann Horn @ 2026-05-26 18:30 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Arjan van de Ven,
	Eric W. Biederman, Jake Edge, linux-kernel, linux-fsdevel, stable,
	Kees Cook

On Tue, May 26, 2026 at 8:22 PM Oleg Nesterov <oleg@redhat.com> wrote:
> On 05/26, Jann Horn wrote:
> >
> > On Tue, May 26, 2026 at 10:48 AM Oleg Nesterov <oleg@redhat.com> wrote:
> > > On 05/18, Jann Horn wrote:
> > > >
> > > > Fix the easy cases where procfs currently calls ptrace_may_access() without
> > > > exec_update_lock protection, where the fix is to simply add the extra lock
> > > > or use mm_access():
> > >
> > > I thought about this too, but I do not know if it is fine performance wise...
> > >
> > > And what about proc_coredump_filter_write() which doesn't use ptrace_may_access() ?
> >
> > Yeah, this series doesn't fix everything,
>
> Aah... Of course, I understand. I wasn't clear. Sorry if it looked as
> "you missed proc_coredump_filter_write" from my side.
>
> What I actually tried to ask:
>
>         - Do you think it makes sense to fix proc_coredump_filter_write()
>           as well?

Yes. Another example I've seen that should probably also be fixed is
seq_show() in fs/proc/fd.c, the handler for /proc/$pid/fdinfo/$fd,
that also currently has zero checks at read() time.

>         - If yes. Do you think we should add another down_read(exec_update_lock) +
>           ptrace_may_access() into proc_coredump_filter_write() ? Or perhaps we
>           should discuss other approaches (exec_id/seqcount/etc) from the very
>           beginning?

I had thought that this series would be the easy, uncontroversial
improvements, and that we could then think about the harder aspect
with read handlers afterwards. I guess I was wrong about this being
the uncontroversial part.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-26 18:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-18 16:35 [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock Jann Horn
2026-05-18 16:35 ` [PATCH 1/2] proc: protect ptrace_may_access() with exec_update_lock (part 1) Jann Horn
2026-05-26  8:48   ` Oleg Nesterov
2026-05-26  9:44     ` Oleg Nesterov
2026-05-26 14:19       ` Jann Horn
2026-05-26 14:16     ` Jann Horn
2026-05-26 18:22       ` Oleg Nesterov
2026-05-26 18:30         ` Jann Horn
2026-05-18 16:35 ` [PATCH 2/2] proc: protect ptrace_may_access() with exec_update_lock (FD links) Jann Horn
2026-05-22 11:47 ` [PATCH 0/2] proc: protect ptrace_may_access() with exec_update_lock Christian Brauner
2026-05-25 19:56 ` Eric W. Biederman
2026-05-26 11:10   ` Oleg Nesterov
2026-05-26 18:22   ` Jann Horn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox