From: Al Viro <viro@zeniv.linux.org.uk>
To: Waiman Long <llong@redhat.com>
Cc: Paul Moore <paul@paul-moore.com>, Eric Paris <eparis@redhat.com>,
Christian Brauner <brauner@kernel.org>,
linux-kernel@vger.kernel.org, audit@vger.kernel.org,
Richard Guy Briggs <rgb@redhat.com>,
Ricardo Robaina <rrobaina@redhat.com>
Subject: Re: [PATCH v2] audit: Avoid excessive dput/dget in audit_context setup and reset paths
Date: Fri, 6 Feb 2026 20:29:33 +0000 [thread overview]
Message-ID: <20260206202933.GA3183987@ZenIV> (raw)
In-Reply-To: <9bc83901-3819-4cf1-a1ba-cc2f52f53504@redhat.com>
On Fri, Feb 06, 2026 at 02:16:13PM -0500, Waiman Long wrote:
> > In all cases when we get to that point, new_fs is always a freshly
> > created private copy of current->fs, not reachable from anywhere
> > other than stack frames of the callers, but the proof is not pretty.
> > copy_mnt_ns() is called only by create_new_namespaces() and it gets to
> > copying anything if and only if CLONE_NEWNS is in the flags. So far,
> > so good. The call in create_new_namespaces() is
> > new_nsp->mnt_ns = copy_mnt_ns(flags, tsk->nsproxy->mnt_ns, user_ns, new_fs);
>
> Thanks for the detailed explanation. After further investigation as to while
> the pwd_refs is set, I found out the code path leading to this situation is
> the unshare syscall.
>
> __x64_sys_unshare()
> => ksys_unshare()
> => unshare_fs(unshare_flags, &new_fs)
> => unshare_nsproxy_namespaces(unshare_flags, &new_nsproxy,
> new_cred, new_fs);
> => create_new_namespaces(unshare_flags, current, user_ns,
> new_fs ? new_fs : current->fs);
>
> Here, CLONE_FS isn't set in unshare_flags. So new_fs is NULL and
> current->fs is passed down to create_new_namespaces(). That is why
> pwd_refs can be set in this case. So it looks like the comment in
> copy_mnt_ns() saying that the fs_struct is private is no longer true,
> at least in this case. So changing fs_struct without taking the lock
> can lead to unexpected result.
CLONE_FS is the red herring here (it will be set earlier in ksys_unshare()
if CLONE_NEWNS is there). I really hate that how convoluted that is,
though.
Look: the case where we might get passed current->fs down there is real.
It can happen in one and only one situation - CLONE_NEWNS in unshare(2)
arguments *and* current->fs->users being 1.
It wouldn't suffice, since there's chroot_fs_refs() that doesn't give
a rat's arse for task->fs being ours - it goes and replaces every
->fs->pwd or ->fs->root that happens to point to old_root.
It's still not a real race, though - both chroot_fs_refs() and that area
in copy_mnt_ns() are serialized on namespace_sem.
And yes, it's obscenely byzantine. It gets even worse when you consider
the fact that pivot_root(2) does not break only because the refcount
drops in chroot_fs_refs() are guaranteed not to reach 0 - the caller is
holding its own references to old_root.{mnt,dentry} and *thar* does not
get dropped until we drop namespace_sem.
IOW, that shit is actually safe, but man, has its correctness grown fucking
convoluted...
Grabbing fs->seq in copy_mnt_ns() wouldn't make the things better, though -
it seriously relies upon the same exclusion with chroot_fs_refs() for
correctness; unless you are willing to hold it over the entire walk through
the mount tree, the proof of correctness doesn't get any simpler.
next prev parent reply other threads:[~2026-02-06 20:27 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-03 19:44 [PATCH v2] audit: Avoid excessive dput/dget in audit_context setup and reset paths Waiman Long
2026-02-03 19:59 ` Al Viro
2026-02-03 20:18 ` Waiman Long
2026-02-03 20:05 ` Al Viro
2026-02-03 20:32 ` Waiman Long
2026-02-03 21:50 ` Al Viro
2026-02-03 23:26 ` Al Viro
2026-02-04 4:21 ` Waiman Long
2026-02-04 6:26 ` Al Viro
2026-02-04 18:16 ` Waiman Long
2026-02-04 20:18 ` Al Viro
2026-02-05 3:03 ` Waiman Long
2026-02-05 4:45 ` Waiman Long
2026-02-05 23:53 ` Al Viro
2026-02-06 1:20 ` Waiman Long
2026-02-06 4:11 ` Waiman Long
2026-02-06 4:19 ` Waiman Long
2026-02-06 5:22 ` Al Viro
2026-02-06 6:31 ` Al Viro
2026-02-06 6:38 ` Al Viro
2026-02-06 7:13 ` Al Viro
2026-02-06 19:16 ` Waiman Long
2026-02-06 20:04 ` Waiman Long
2026-02-06 20:38 ` Al Viro
2026-02-07 8:25 ` [PATCH][RFC] bug in unshare(2) failure recovery Al Viro
2026-02-07 23:06 ` Waiman Long
2026-02-17 12:49 ` Christian Brauner
2026-02-17 12:49 ` Christian Brauner
2026-02-06 20:29 ` Al Viro [this message]
2026-02-06 20:58 ` setns(2) vs. pivot_root(2) (was Re: [PATCH v2] audit: Avoid excessive dput/dget in audit_context setup and reset paths) Al Viro
2026-02-06 21:09 ` Al Viro
2026-02-17 13:12 ` Christian Brauner
2026-02-06 8:15 ` [PATCH v2] audit: Avoid excessive dput/dget in audit_context setup and reset paths Al Viro
2026-02-05 5:22 ` Al Viro
2026-02-05 13:59 ` Waiman Long
2026-02-05 17:53 ` Mateusz Guzik
2026-02-17 13:33 ` Christian Brauner
2026-02-17 13:44 ` Mateusz Guzik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260206202933.GA3183987@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=audit@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=eparis@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=llong@redhat.com \
--cc=paul@paul-moore.com \
--cc=rgb@redhat.com \
--cc=rrobaina@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.