From: Christian Brauner <brauner@kernel.org>
To: fstests@vger.kernel.org
Cc: Christian Brauner <brauner@kernel.org>
Subject: [PATCH] vfs: fix dumpable race in create_userns_hierarchy()
Date: Mon, 23 Mar 2026 14:32:15 +0100 [thread overview]
Message-ID: <20260323-fix-vfs-bugs-v1-1-76a6aed624d9@kernel.org> (raw)
All processes in the userns hierarchy share the same mm_struct due to
CLONE_VM. When a child calls setresuid() in switch_ids(), the kernel
clears the dumpable flag on the shared mm. The child immediately
re-sets it via prctl(PR_SET_DUMPABLE, 1), but there is a window
between the two where the parent may be opening /proc/PID/ns/user for
the previous level's child. That open hits ptrace_may_access() which
checks get_dumpable(mm) and, finding it zero, falls back to
ptrace_has_cap(mm->user_ns, mode). Since mm->user_ns is init_user_ns
(the mm was created by real root) and the parent is only ns-root, the
capability check fails and the open returns -EACCES.
Fix this by extending the existing two-way socketpair handshake into a
three-way handshake: the child signals the parent after becoming
dumpable, then blocks until the parent has opened /proc/PID/ns/user
and signals back. Only then does the child proceed to
create_userns_hierarchy() for the next level. This guarantees no
concurrent setresuid() can clear dumpable while the parent's open() is
in flight.
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
src/vfs/utils.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/src/vfs/utils.c b/src/vfs/utils.c
index 52bb7e42..0b435afe 100644
--- a/src/vfs/utils.c
+++ b/src/vfs/utils.c
@@ -312,6 +312,18 @@ static int userns_fd_cb(void *data)
if (ret < 0)
return syserror("failure: write to socketpair");
+ /*
+ * Wait for the parent to open our /proc/PID/ns/user before
+ * proceeding to create the next userns level. Without this,
+ * setresuid() in the next-level child clears the dumpable flag
+ * on the shared mm (CLONE_VM) and the parent's open() fails the
+ * ptrace_may_access() check with -EACCES because mm->user_ns is
+ * init_user_ns where userns processes lack CAP_SYS_PTRACE.
+ */
+ ret = read_nointr(h->fd_event, &c, 1);
+ if (ret < 0)
+ return syserror("failure: read from socketpair");
+
ret = create_userns_hierarchy(++h);
if (ret < 0)
return syserror("failure: userns level %d", h->level);
@@ -377,6 +389,14 @@ int create_userns_hierarchy(struct userns_hierarchy *h)
goto out_wait;
}
+ /* Tell the child it can now proceed to create the next level. */
+ bytes = write_nointr(fd_socket[0], "1", 1);
+ if (bytes < 0) {
+ kill(pid, SIGKILL);
+ syserror("failure: write to socketpair");
+ goto out_wait;
+ }
+
fret = 0;
out_wait:
---
base-commit: 3ded3e13c008326d197d11ac975049ed1f8ec922
change-id: 20260323-fix-vfs-bugs-b4f8a492d6d7
reply other threads:[~2026-03-23 13:32 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260323-fix-vfs-bugs-v1-1-76a6aed624d9@kernel.org \
--to=brauner@kernel.org \
--cc=fstests@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox