* [PATCH] vfs: fix dumpable race in create_userns_hierarchy()
@ 2026-03-23 13:32 Christian Brauner
2026-05-15 16:24 ` Zorro Lang
0 siblings, 1 reply; 2+ messages in thread
From: Christian Brauner @ 2026-03-23 13:32 UTC (permalink / raw)
To: fstests; +Cc: Christian Brauner
All processes in the userns hierarchy share the same mm_struct due to
CLONE_VM. When a child calls setresuid() in switch_ids(), the kernel
clears the dumpable flag on the shared mm. The child immediately
re-sets it via prctl(PR_SET_DUMPABLE, 1), but there is a window
between the two where the parent may be opening /proc/PID/ns/user for
the previous level's child. That open hits ptrace_may_access() which
checks get_dumpable(mm) and, finding it zero, falls back to
ptrace_has_cap(mm->user_ns, mode). Since mm->user_ns is init_user_ns
(the mm was created by real root) and the parent is only ns-root, the
capability check fails and the open returns -EACCES.
Fix this by extending the existing two-way socketpair handshake into a
three-way handshake: the child signals the parent after becoming
dumpable, then blocks until the parent has opened /proc/PID/ns/user
and signals back. Only then does the child proceed to
create_userns_hierarchy() for the next level. This guarantees no
concurrent setresuid() can clear dumpable while the parent's open() is
in flight.
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
src/vfs/utils.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/src/vfs/utils.c b/src/vfs/utils.c
index 52bb7e42..0b435afe 100644
--- a/src/vfs/utils.c
+++ b/src/vfs/utils.c
@@ -312,6 +312,18 @@ static int userns_fd_cb(void *data)
if (ret < 0)
return syserror("failure: write to socketpair");
+ /*
+ * Wait for the parent to open our /proc/PID/ns/user before
+ * proceeding to create the next userns level. Without this,
+ * setresuid() in the next-level child clears the dumpable flag
+ * on the shared mm (CLONE_VM) and the parent's open() fails the
+ * ptrace_may_access() check with -EACCES because mm->user_ns is
+ * init_user_ns where userns processes lack CAP_SYS_PTRACE.
+ */
+ ret = read_nointr(h->fd_event, &c, 1);
+ if (ret < 0)
+ return syserror("failure: read from socketpair");
+
ret = create_userns_hierarchy(++h);
if (ret < 0)
return syserror("failure: userns level %d", h->level);
@@ -377,6 +389,14 @@ int create_userns_hierarchy(struct userns_hierarchy *h)
goto out_wait;
}
+ /* Tell the child it can now proceed to create the next level. */
+ bytes = write_nointr(fd_socket[0], "1", 1);
+ if (bytes < 0) {
+ kill(pid, SIGKILL);
+ syserror("failure: write to socketpair");
+ goto out_wait;
+ }
+
fret = 0;
out_wait:
---
base-commit: 3ded3e13c008326d197d11ac975049ed1f8ec922
change-id: 20260323-fix-vfs-bugs-b4f8a492d6d7
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] vfs: fix dumpable race in create_userns_hierarchy()
2026-03-23 13:32 [PATCH] vfs: fix dumpable race in create_userns_hierarchy() Christian Brauner
@ 2026-05-15 16:24 ` Zorro Lang
0 siblings, 0 replies; 2+ messages in thread
From: Zorro Lang @ 2026-05-15 16:24 UTC (permalink / raw)
To: Christian Brauner; +Cc: fstests
On Mon, Mar 23, 2026 at 02:32:15PM +0100, Christian Brauner wrote:
> All processes in the userns hierarchy share the same mm_struct due to
> CLONE_VM. When a child calls setresuid() in switch_ids(), the kernel
> clears the dumpable flag on the shared mm. The child immediately
> re-sets it via prctl(PR_SET_DUMPABLE, 1), but there is a window
> between the two where the parent may be opening /proc/PID/ns/user for
> the previous level's child. That open hits ptrace_may_access() which
> checks get_dumpable(mm) and, finding it zero, falls back to
> ptrace_has_cap(mm->user_ns, mode). Since mm->user_ns is init_user_ns
> (the mm was created by real root) and the parent is only ns-root, the
> capability check fails and the open returns -EACCES.
>
> Fix this by extending the existing two-way socketpair handshake into a
> three-way handshake: the child signals the parent after becoming
> dumpable, then blocks until the parent has opened /proc/PID/ns/user
> and signals back. Only then does the child proceed to
> create_userns_hierarchy() for the next level. This guarantees no
> concurrent setresuid() can clear dumpable while the parent's open() is
> in flight.
>
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> ---
Sorry, a few unforeseen things came up on my end and I lost track of this patch.
Reviewed-by: Zorro Lang <zlang@kernel.org>
> src/vfs/utils.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/src/vfs/utils.c b/src/vfs/utils.c
> index 52bb7e42..0b435afe 100644
> --- a/src/vfs/utils.c
> +++ b/src/vfs/utils.c
> @@ -312,6 +312,18 @@ static int userns_fd_cb(void *data)
> if (ret < 0)
> return syserror("failure: write to socketpair");
>
> + /*
> + * Wait for the parent to open our /proc/PID/ns/user before
> + * proceeding to create the next userns level. Without this,
> + * setresuid() in the next-level child clears the dumpable flag
> + * on the shared mm (CLONE_VM) and the parent's open() fails the
> + * ptrace_may_access() check with -EACCES because mm->user_ns is
> + * init_user_ns where userns processes lack CAP_SYS_PTRACE.
> + */
> + ret = read_nointr(h->fd_event, &c, 1);
> + if (ret < 0)
> + return syserror("failure: read from socketpair");
> +
> ret = create_userns_hierarchy(++h);
> if (ret < 0)
> return syserror("failure: userns level %d", h->level);
> @@ -377,6 +389,14 @@ int create_userns_hierarchy(struct userns_hierarchy *h)
> goto out_wait;
> }
>
> + /* Tell the child it can now proceed to create the next level. */
> + bytes = write_nointr(fd_socket[0], "1", 1);
> + if (bytes < 0) {
> + kill(pid, SIGKILL);
> + syserror("failure: write to socketpair");
> + goto out_wait;
> + }
> +
> fret = 0;
>
> out_wait:
>
> ---
> base-commit: 3ded3e13c008326d197d11ac975049ed1f8ec922
> change-id: 20260323-fix-vfs-bugs-b4f8a492d6d7
>
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-05-15 16:24 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-23 13:32 [PATCH] vfs: fix dumpable race in create_userns_hierarchy() Christian Brauner
2026-05-15 16:24 ` Zorro Lang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox