[PATCHED][RFC][CFT] mount-related stuff

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHED][RFC][CFT] mount-related stuff
@ 2025-08-25  4:40 Al Viro
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                   ` (3 more replies)
  0 siblings, 4 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:40 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Linus Torvalds, Christian Brauner, Jan Kara

	Most of this pile is basically an attempt to see how well do
cleanup.h-style mechanisms apply in mount handling.  That stuff lives in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.mount
Rebased to -rc3 (used to be a bit past -rc2, branched at mount fixes merge)
Individual patches in followups.

	Please, help with review and testing.  It seems to survive the
local beating and code generation seems to be OK, but more testing
would be a good thing and I would really like to see comments on that
stuff.

	This is not all I've got around mount handling, but I'd rather
get that thing out for review before starting to sort out other local
mount-related branches.

	Series overview:

	Part 1: guards.

	This part starts with infrastructure, followed by one-by-one
conversions to the guard/scoped_guard in some of the places that fit
that well enough.  Note that one of those places turned out to be taking
mount_lock for no reason whatsoever; I already see places where we do
write_seqlock when read_seqlock_excl would suffice, etc.

	Folks, _please_ don't do any bulk conversions in that area.
IMO one area where RAII becomes dangerous is locking; usually it's not
a big deal to delay freeing some object a bit, but delay dropping a
lock and you risk introducing deadlocks that will be bloody hard to spot.
It _has_ to be done carefully; we had trouble in that area several times
over the last year or so in fs/namespace.c alone.  Another fun problem
is that quite a few comments regarding the locking in there are stale.
We still have the comments that talk about mount lock as if it had been
an rwlock-like thing.  It hadn't been that for more than a decade now.
It needs to be documented sanely; so do the access rules to the data
structures involved.  I hope to get some of that into the tree this cycle,
but it's still in progress.

1/52)  fs/namespace.c: fix the namespace_sem guard mess
	New guards: namespace_excl and namespace_shared.  The former implies
the latter, as for anything rwsem-like.  No inode locks, no dropping the final
references, no opening files, etc. in scope of those.
2/52)  introduced guards for mount_lock
	New guards: mount_writer, mount_locked_reader.  That's write_seqlock
and read_seqlock_excl on mount_lock; obviously, nothing blocking should be
done in scope of those.
3/52)  fs/namespace.c: allow to drop vfsmount references via __free(mntput)
	Missing DEFINE_FREE (for mntput()); local in fs/namespace.c, to be
used only for keeping shit out of namespace_... and mount_... scopes.
4/52)  __detach_mounts(): use guards
5/52)  __is_local_mountpoint(): use guards
6/52)  do_change_type(): use guards
7/52)  do_set_group(): use guards
8/52)  mark_mounts_for_expiry(): use guards
9/52)  put_mnt_ns(): use guards
10/52)  mnt_already_visible(): use guards
	a bunch of clear-cut conversions, with explanations of the reasons
why this or that guard is needed.
11/52)  check_for_nsfs_mounts(): no need to take locks
	... and here we have one where it turns out that locking had been
excessive.  Iterating through a subtree in mount_locked_reader scope is
safe, all right, but (1) mount_writer is not needed here at all and (2)
namespace_shared + a reference held to the root of subtree is also enough.
All callers had (2) already.  Documented the locking requirements for
function, removed {,un}lock_mount_hash() in it...
12/52)  propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint()
	This one is interesting - existing code had been equivalent to
scoped_guard(mount_locked_reader), and it's right for that call.  However,
mnt_set_mountpoint() generally requires mount_writer - the only reason we
get away with that here is that the mount in question never had been
reachable from the mounts visible to other threads.
13/52)  has_locked_children(): use guards
14/52)  mnt_set_expiry(): use guards
15/52)  path_is_under(): use guards
	more clear-cut conversions with explanations.
16/52)  current_chrooted(): don't bother with follow_down_one()
17/52)  current_chrooted(): use guards
	this pair might be better off with #16 taken to the beginning
of the series (or to a separate branch merge into this one); no better
reason to do as I had than wanting to keep the guard infrastructure
in the very beginning.

	Part 2: turning unlock_mount() into __cleanup.

	Environment for mounting something on given location consists of:
1) namespace_excl scope
2) parent mount - the one we'll be attaching things to.
3) mountpoint to be, protected from disappearing under us.
4) inode of that mountpoint's dentry held exclusive.
	Unfortunately, we can't take inode locks in namespace_excl scopes.
And we want to cope with the possibility that somebody has managed to
mount something on that place while we'd been taking locks.  "Cope" part
is simple for finish_automount() ("drop our mount and go away quietly;
somebody triggered it before we did"), but for everything else it's
trickier - "use whatever's overmounting that place now (with the right
locks, please)".
	lock_mount() does all of that (do_lock_mount(), actually), with
unlock_mount() closing the scope.  And it's definitely a good candidate
for __cleanup()-based approach, except that
* the damn thing can return an error and conditional variants of that
infrastructure are too revolting.
* parent mount is returned in a fucking awful way - we modify the struct
path passed to us as location to mount on and then its ->mnt is the parent
to be... except for the "beneath" variant where we play convoluted games
with "no, here we want the parent of that".  Implementation is also
vulnerable to umount propagtion races.
* the structure we set up (everything except the parent) is inserted
into a linked list by lock_mount().  That excludes DEFINE_CLASS() -
it wants the value formed and then copied to the variable we are
defining.
* it contains an implicit namespace_excl scope, so path_put() and its
ilk *must* be done after the unlock_mount().  And most of the users have
gotos past that.
	The first two problems are solved by adding an explicit pointer
to parent mount into struct pinned_mountpoint.	Having lock_mount()
failure reported by setting it to ERR_PTR(-E...) allows to avoid the
problem with expressing the constructor failure.  The third one is dealt
with by defining local macros to be used instead of CLASS - I went with
LOCK_MOUNT(mp, path) which defines struct pinned_mountpoint mp with
__cleanup(unlock_mount) and sets it up.  If anybody has better suggestions,
I'll be glad to hear those.
	The last one is dealt with by massaging the users to form that
would have all post-unlock_mount() stuff done by __free().

	First, several trivial cleanups:
18/52)  do_move_mount(): trim local variables
19/52)  do_move_mount(): deal with the checks on old_path early
20/52)  move_mount(2): take sanity checks in 'beneath' case into do_lock_mount()
21/52)  finish_automount(): simplify the ELOOP check

	Getting rid of post-unlock_mount() stuff:
22/52)  do_loopback(): use __free(path_put) to deal with old_path
23/52)  pivot_root(2): use __free() to deal with struct path in it
24/52)  finish_automount(): take the lock_mount() analogue into a helper
	this one turns the open-coded logics into lock_mount_exact() with
the same kind of calling conventions as lock_mount() and do_lock_mount()
25/52)  do_new_mount_rc(): use __free() to deal with dropping mnt on failure
26/52)  finish_automount(): use __free() to deal with dropping mnt on failure

	This is the main part:
27/52)  change calling conventions for lock_mount() et.al.

	Followups, cleaning up the games with parent mount in the user:
28/52)  do_move_mount(): use the parent mount returned by do_lock_mount()
29/52)  do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path
30/52)  graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint

	Part 3: getting rid of mutating struct path there.

	do_lock_mount() is still playing silly buggers with struct path it
had been given - the logics in that thing hadn't changed.  It's not a pretty
function and it's racy as well; the thing is, by this point its users have
almost no use for the changed contents of struct path - dentry can be derived
from struct mountpoint, parent mount to use is provided directly and we
want that a lot more than modified path->mnt.  There's only one place
(in can_move_mount_beneath()) where we still want that and it's not hard
to reconstruct the value by *original* path->mnt value + parent mount to
be used.

	Getting rid of ->dentry uses.
31/52)  pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry
32/52)  don't bother passing new_path->dentry to can_move_mount_beneath()

	A helper, already open-coded in a couple of places; carved out of
the next patch to keep it reasonably small
33/52)  new helper: topmost_overmount()

	Rewrite of do_lock_mount() to keep path constant + trivial change
in do_move_mount() to adjust the argument it passes to can_move_mount_beneath():
34/52)  do_lock_mount(): don't modify path.
	

	Part 5: a bunch of trivial cleanups (mostly constifications)

35/52)  constify check_mnt()
36/52)  do_mount_setattr(): constify path argument
37/52)  do_set_group(): constify path arguments
38/52)  drop_collected_paths(): constify arguments
39/52)  collect_paths(): constify the return value
40/52)  do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s)
41/52)  mnt_warn_timestamp_expiry(): constify struct path argument
42/52)  do_new_mount{,_fc}(): constify struct path argument
43/52)  do_{loopback,change_type,remount,reconfigure_mnt}(): constify struct path argument
44/52)  path_mount(): constify struct path argument
45/52)  may_copy_tree(), __do_loopback(): constify struct path argument
46/52)  path_umount(): constify struct path argument
47/52)  constify can_move_mount_beneath() arguments
48/52)  do_move_mount_old(): use __free(path_put)
49/52)  do_mount(): use __free(path_put)

	Part 6: assorted stuff, will grow.

50/52)  umount_tree(): take all victims out of propagation graph at once
[had been earlier]
	For each removed mount we need to calculate where the slaves
will end up.  To avoid duplicating that work, do it for all mounts to be
removed at once, taking the mounts themselves out of propagation graph as
we go, then do all transfers; the duplicate work on finding destinations
is avoided since if we run into a mount that already had destination
found, we don't need to trace the rest of the way.  That's guaranteed
O(removed mounts) for finding destinations and removing from propagation
graph and O(surviving mounts that have master removed) for transfers.

51/52)  ecryptfs: get rid of pointless mount references in ecryptfs dentries
	->lower_path.mnt has the same value for all dentries on given
ecryptfs instance and if somebody goes for mountpoint-crossing variant
where that would not be true, we can deal with that when it happens
(and _not_ with duplicating these reference into each dentry).
	As it is, we are better off just sticking a reference into
ecryptfs-private part of superblock and keeping it pinned until
->kill_sb().
	That way we can stick a reference to underlying dentry right into
->d_fsdata of ecryptfs one, getting rid of indirection through struct
ecryptfs_dentry_info, along with the entire struct ecryptfs_dentry_info
machinery.

52/52)  fs/namespace.c: sanitize descriptions for {__,}lookup_mnt()
	Comments regarding "shadow mounts" were stale - no such thing
anymore.  Document the locking requirements for __lookup_mnt()...


FWIW, the current diffstat:

 fs/ecryptfs/dentry.c          |  14 +-
 fs/ecryptfs/ecryptfs_kernel.h |  27 +-
 fs/ecryptfs/file.c            |  15 +-
 fs/ecryptfs/inode.c           |  19 +-
 fs/ecryptfs/main.c            |  24 +-
 fs/internal.h                 |   4 +-
 fs/mount.h                    |  12 +
 fs/namespace.c                | 775 +++++++++++++++++++-----------------------
 fs/pnode.c                    |  75 ++--
 fs/pnode.h                    |   1 +
 include/linux/mount.h         |   4 +-
 kernel/audit_tree.c           |  12 +-
 12 files changed, 464 insertions(+), 518 deletions(-)

^ permalink raw reply	[flat|nested] 320+ messages in thread

* [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess
  2025-08-25  4:40 [PATCHED][RFC][CFT] mount-related stuff Al Viro
@ 2025-08-25  4:43 ` Al Viro
  2025-08-25  4:43   ` [PATCH 02/52] introduced guards for mount_lock Al Viro
                     ` (51 more replies)
  2025-08-25 12:26 ` [PATCHED][RFC][CFT] mount-related stuff Christian Brauner
                   ` (2 subsequent siblings)
  3 siblings, 52 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

If anything, namespace_lock should be DEFINE_LOCK_GUARD_0, not DEFINE_GUARD.
That way we
	* do not need to feed it a bogus argument
	* do not get gcc trying to compare an address of static in
file variable with -4097 - and, if we are unlucky, trying to keep
it in a register, with spills and all such.

The same problems apply to grabbing namespace_sem shared.

Rename it to namespace_excl, add namespace_shared, convert the existing users:

    guard(namespace_lock, &namespace_sem) => guard(namespace_excl)()
    guard(rwsem_read, &namespace_sem) => guard(namespace_shared)()
    scoped_guard(namespace_lock, &namespace_sem) => scoped_guard(namespace_excl)
    scoped_guard(rwsem_read, &namespace_sem) => scoped_guard(namespace_shared)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index ae6d1312b184..fcea65587ff9 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -82,6 +82,12 @@ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
 static struct mnt_namespace *emptied_ns; /* protected by namespace_sem */
 static DEFINE_SEQLOCK(mnt_ns_tree_lock);
 
+static inline void namespace_lock(void);
+static void namespace_unlock(void);
+DEFINE_LOCK_GUARD_0(namespace_excl, namespace_lock(), namespace_unlock())
+DEFINE_LOCK_GUARD_0(namespace_shared, down_read(&namespace_sem),
+				      up_read(&namespace_sem))
+
 #ifdef CONFIG_FSNOTIFY
 LIST_HEAD(notify_list); /* protected by namespace_sem */
 #endif
@@ -1776,8 +1782,6 @@ static inline void namespace_lock(void)
 	down_write(&namespace_sem);
 }
 
-DEFINE_GUARD(namespace_lock, struct rw_semaphore *, namespace_lock(), namespace_unlock())
-
 enum umount_tree_flags {
 	UMOUNT_SYNC = 1,
 	UMOUNT_PROPAGATE = 2,
@@ -2306,7 +2310,7 @@ struct path *collect_paths(const struct path *path,
 	struct path *res = prealloc, *to_free = NULL;
 	unsigned n = 0;
 
-	guard(rwsem_read)(&namespace_sem);
+	guard(namespace_shared)();
 
 	if (!check_mnt(root))
 		return ERR_PTR(-EINVAL);
@@ -2361,7 +2365,7 @@ void dissolve_on_fput(struct vfsmount *mnt)
 			return;
 	}
 
-	scoped_guard(namespace_lock, &namespace_sem) {
+	scoped_guard(namespace_excl) {
 		if (!anon_ns_root(m))
 			return;
 
@@ -2435,7 +2439,7 @@ struct vfsmount *clone_private_mount(const struct path *path)
 	struct mount *old_mnt = real_mount(path->mnt);
 	struct mount *new_mnt;
 
-	guard(rwsem_read)(&namespace_sem);
+	guard(namespace_shared)();
 
 	if (IS_MNT_UNBINDABLE(old_mnt))
 		return ERR_PTR(-EINVAL);
@@ -5957,7 +5961,7 @@ SYSCALL_DEFINE4(statmount, const struct mnt_id_req __user *, req,
 	if (ret)
 		return ret;
 
-	scoped_guard(rwsem_read, &namespace_sem)
+	scoped_guard(namespace_shared)
 		ret = do_statmount(ks, kreq.mnt_id, kreq.mnt_ns_id, ns);
 
 	if (!ret)
@@ -6079,7 +6083,7 @@ SYSCALL_DEFINE4(listmount, const struct mnt_id_req __user *, req,
 	 * We only need to guard against mount topology changes as
 	 * listmount() doesn't care about any mount properties.
 	 */
-	scoped_guard(rwsem_read, &namespace_sem)
+	scoped_guard(namespace_shared)
 		ret = do_listmount(ns, kreq.mnt_id, last_mnt_id, kmnt_ids,
 				   nr_mnt_ids, (flags & LISTMOUNT_REVERSE));
 	if (ret <= 0)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 02/52] introduced guards for mount_lock
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:32     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 03/52] fs/namespace.c: allow to drop vfsmount references via __free(mntput) Al Viro
                     ` (50 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

mount_writer: write_seqlock; that's an equivalent of {un,}lock_mount_hash()
mount_locked_reader: read_seqlock_excl; these tend to be open-coded.

No bulk conversions, please - if nothing else, quite a few places take
use mount_writer form when mount_locked_reader is sufficent.  It needs
to be dealt with carefully.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/mount.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/mount.h b/fs/mount.h
index 97737051a8b9..ed8c83ba836a 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -154,6 +154,11 @@ static inline void get_mnt_ns(struct mnt_namespace *ns)
 
 extern seqlock_t mount_lock;
 
+DEFINE_LOCK_GUARD_0(mount_writer, write_seqlock(&mount_lock),
+		    write_sequnlock(&mount_lock))
+DEFINE_LOCK_GUARD_0(mount_locked_reader, read_seqlock_excl(&mount_lock),
+		    read_sequnlock_excl(&mount_lock))
+
 struct proc_mounts {
 	struct mnt_namespace *ns;
 	struct path root;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 03/52] fs/namespace.c: allow to drop vfsmount references via __free(mntput)
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
  2025-08-25  4:43   ` [PATCH 02/52] introduced guards for mount_lock Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:33     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 04/52] __detach_mounts(): use guards Al Viro
                     ` (49 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Note that just as path_put, it should never be done in scope of
namespace_sem, be it shared or exclusive.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index fcea65587ff9..767ab751ee2a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -88,6 +88,8 @@ DEFINE_LOCK_GUARD_0(namespace_excl, namespace_lock(), namespace_unlock())
 DEFINE_LOCK_GUARD_0(namespace_shared, down_read(&namespace_sem),
 				      up_read(&namespace_sem))
 
+DEFINE_FREE(mntput, struct vfsmount *, if (!IS_ERR(_T)) mntput(_T))
+
 #ifdef CONFIG_FSNOTIFY
 LIST_HEAD(notify_list); /* protected by namespace_sem */
 #endif
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 04/52] __detach_mounts(): use guards
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
  2025-08-25  4:43   ` [PATCH 02/52] introduced guards for mount_lock Al Viro
  2025-08-25  4:43   ` [PATCH 03/52] fs/namespace.c: allow to drop vfsmount references via __free(mntput) Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:33     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 05/52] __is_local_mountpoint(): " Al Viro
                     ` (48 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Clean fit for guards use; guards can't be weaker due to umount_tree() calls.
---
 fs/namespace.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 767ab751ee2a..1ae1ab8815c9 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2032,10 +2032,11 @@ void __detach_mounts(struct dentry *dentry)
 	struct pinned_mountpoint mp = {};
 	struct mount *mnt;
 
-	namespace_lock();
-	lock_mount_hash();
+	guard(namespace_excl)();
+	guard(mount_writer)();
+
 	if (!lookup_mountpoint(dentry, &mp))
-		goto out_unlock;
+		return;
 
 	event++;
 	while (mp.node.next) {
@@ -2047,9 +2048,6 @@ void __detach_mounts(struct dentry *dentry)
 		else umount_tree(mnt, UMOUNT_CONNECTED);
 	}
 	unpin_mountpoint(&mp);
-out_unlock:
-	unlock_mount_hash();
-	namespace_unlock();
 }
 
 /*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 05/52] __is_local_mountpoint(): use guards
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (2 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 04/52] __detach_mounts(): use guards Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:33     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 06/52] do_change_type(): " Al Viro
                     ` (47 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

clean fit; namespace_shared due to iterating through ns->mounts.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 1ae1ab8815c9..f1460ddd1486 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -906,17 +906,14 @@ bool __is_local_mountpoint(const struct dentry *dentry)
 {
 	struct mnt_namespace *ns = current->nsproxy->mnt_ns;
 	struct mount *mnt, *n;
-	bool is_covered = false;
 
-	down_read(&namespace_sem);
-	rbtree_postorder_for_each_entry_safe(mnt, n, &ns->mounts, mnt_node) {
-		is_covered = (mnt->mnt_mountpoint == dentry);
-		if (is_covered)
-			break;
-	}
-	up_read(&namespace_sem);
+	guard(namespace_shared)();
+
+	rbtree_postorder_for_each_entry_safe(mnt, n, &ns->mounts, mnt_node)
+		if (mnt->mnt_mountpoint == dentry)
+			return true;
 
-	return is_covered;
+	return false;
 }
 
 struct pinned_mountpoint {
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 06/52] do_change_type(): use guards
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (3 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 05/52] __is_local_mountpoint(): " Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:34     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 07/52] do_set_group(): " Al Viro
                     ` (46 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

clean fit; namespace_excl to modify propagation graph

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index f1460ddd1486..a6a7b068770a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2899,7 +2899,7 @@ static int do_change_type(struct path *path, int ms_flags)
 	struct mount *mnt = real_mount(path->mnt);
 	int recurse = ms_flags & MS_REC;
 	int type;
-	int err = 0;
+	int err;
 
 	if (!path_mounted(path))
 		return -EINVAL;
@@ -2908,23 +2908,22 @@ static int do_change_type(struct path *path, int ms_flags)
 	if (!type)
 		return -EINVAL;
 
-	namespace_lock();
+	guard(namespace_excl)();
+
 	err = may_change_propagation(mnt);
 	if (err)
-		goto out_unlock;
+		return err;
 
 	if (type == MS_SHARED) {
 		err = invent_group_ids(mnt, recurse);
 		if (err)
-			goto out_unlock;
+			return err;
 	}
 
 	for (m = mnt; m; m = (recurse ? next_mnt(m, mnt) : NULL))
 		change_mnt_propagation(m, type);
 
- out_unlock:
-	namespace_unlock();
-	return err;
+	return 0;
 }
 
 /* may_copy_tree() - check if a mount tree can be copied
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 07/52] do_set_group(): use guards
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (4 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 06/52] do_change_type(): " Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:35     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 08/52] mark_mounts_for_expiry(): " Al Viro
                     ` (45 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

clean fit; namespace_excl to modify propagation graph

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a6a7b068770a..13e2f3837a26 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3349,47 +3349,44 @@ static inline int tree_contains_unbindable(struct mount *mnt)
 
 static int do_set_group(struct path *from_path, struct path *to_path)
 {
-	struct mount *from, *to;
+	struct mount *from = real_mount(from_path->mnt);
+	struct mount *to = real_mount(to_path->mnt);
 	int err;
 
-	from = real_mount(from_path->mnt);
-	to = real_mount(to_path->mnt);
-
-	namespace_lock();
+	guard(namespace_excl)();
 
 	err = may_change_propagation(from);
 	if (err)
-		goto out;
+		return err;
 	err = may_change_propagation(to);
 	if (err)
-		goto out;
+		return err;
 
-	err = -EINVAL;
 	/* To and From paths should be mount roots */
 	if (!path_mounted(from_path))
-		goto out;
+		return -EINVAL;
 	if (!path_mounted(to_path))
-		goto out;
+		return -EINVAL;
 
 	/* Setting sharing groups is only allowed across same superblock */
 	if (from->mnt.mnt_sb != to->mnt.mnt_sb)
-		goto out;
+		return -EINVAL;
 
 	/* From mount root should be wider than To mount root */
 	if (!is_subdir(to->mnt.mnt_root, from->mnt.mnt_root))
-		goto out;
+		return -EINVAL;
 
 	/* From mount should not have locked children in place of To's root */
 	if (__has_locked_children(from, to->mnt.mnt_root))
-		goto out;
+		return -EINVAL;
 
 	/* Setting sharing groups is only allowed on private mounts */
 	if (IS_MNT_SHARED(to) || IS_MNT_SLAVE(to))
-		goto out;
+		return -EINVAL;
 
 	/* From should not be private */
 	if (!IS_MNT_SHARED(from) && !IS_MNT_SLAVE(from))
-		goto out;
+		return -EINVAL;
 
 	if (IS_MNT_SLAVE(from)) {
 		hlist_add_behind(&to->mnt_slave, &from->mnt_slave);
@@ -3401,11 +3398,7 @@ static int do_set_group(struct path *from_path, struct path *to_path)
 		list_add(&to->mnt_share, &from->mnt_share);
 		set_mnt_shared(to);
 	}
-
-	err = 0;
-out:
-	namespace_unlock();
-	return err;
+	return 0;
 }
 
 /**
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 08/52] mark_mounts_for_expiry(): use guards
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (5 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 07/52] do_set_group(): " Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:37     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 09/52] put_mnt_ns(): " Al Viro
                     ` (44 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Clean fit; guards can't be weaker due to umount_tree() calls.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 13e2f3837a26..898a6b7307e4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3886,8 +3886,8 @@ void mark_mounts_for_expiry(struct list_head *mounts)
 	if (list_empty(mounts))
 		return;
 
-	namespace_lock();
-	lock_mount_hash();
+	guard(namespace_excl)();
+	guard(mount_writer)();
 
 	/* extract from the expiration list every vfsmount that matches the
 	 * following criteria:
@@ -3909,8 +3909,6 @@ void mark_mounts_for_expiry(struct list_head *mounts)
 		touch_mnt_namespace(mnt->mnt_ns);
 		umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC);
 	}
-	unlock_mount_hash();
-	namespace_unlock();
 }
 
 EXPORT_SYMBOL_GPL(mark_mounts_for_expiry);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 09/52] put_mnt_ns(): use guards
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (6 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 08/52] mark_mounts_for_expiry(): " Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:37     ` Christian Brauner
  2025-08-25 12:40     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 10/52] mnt_already_visible(): " Al Viro
                     ` (43 subsequent siblings)
  51 siblings, 2 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

clean fit; guards can't be weaker due to umount_tree() call.
Setting emptied_ns requires namespace_excl, but not anything
mount_lock-related.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 898a6b7307e4..86a86be2b0ef 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6153,12 +6153,10 @@ void put_mnt_ns(struct mnt_namespace *ns)
 {
 	if (!refcount_dec_and_test(&ns->ns.count))
 		return;
-	namespace_lock();
+	guard(namespace_excl)();
 	emptied_ns = ns;
-	lock_mount_hash();
+	guard(mount_writer)();
 	umount_tree(ns->root, 0);
-	unlock_mount_hash();
-	namespace_unlock();
 }
 
 struct vfsmount *kern_mount(struct file_system_type *type)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 10/52] mnt_already_visible(): use guards
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (7 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 09/52] put_mnt_ns(): " Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:39     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 11/52] check_for_nsfs_mounts(): no need to take locks Al Viro
                     ` (42 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

clean fit; namespace_shared due to iterating through ns->mounts.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 86a86be2b0ef..a5d37b97088f 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6232,9 +6232,8 @@ static bool mnt_already_visible(struct mnt_namespace *ns,
 {
 	int new_flags = *new_mnt_flags;
 	struct mount *mnt, *n;
-	bool visible = false;
 
-	down_read(&namespace_sem);
+	guard(namespace_shared)();
 	rbtree_postorder_for_each_entry_safe(mnt, n, &ns->mounts, mnt_node) {
 		struct mount *child;
 		int mnt_flags;
@@ -6281,13 +6280,10 @@ static bool mnt_already_visible(struct mnt_namespace *ns,
 		/* Preserve the locked attributes */
 		*new_mnt_flags |= mnt_flags & (MNT_LOCK_READONLY | \
 					       MNT_LOCK_ATIME);
-		visible = true;
-		goto found;
+		return true;
 	next:	;
 	}
-found:
-	up_read(&namespace_sem);
-	return visible;
+	return false;
 }
 
 static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 11/52] check_for_nsfs_mounts(): no need to take locks
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (8 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 10/52] mnt_already_visible(): " Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:48     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 12/52] propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint() Al Viro
                     ` (41 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Currently we are taking mount_writer; what that function needs is
either mount_locked_reader (we are not changing anything, we just
want to iterate through the subtree) or namespace_shared and
a reference held by caller on the root of subtree - that's also
enough to stabilize the topology.

The thing is, all callers are already holding at least namespace_shared
as well as a reference to the root of subtree.

Let's make the callers provide locking warranties - don't mess with
mount_lock in check_for_nsfs_mounts() itself and document the locking
requirements.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a5d37b97088f..59948cbf9c47 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2402,21 +2402,15 @@ bool has_locked_children(struct mount *mnt, struct dentry *dentry)
  * specified subtree.  Such references can act as pins for mount namespaces
  * that aren't checked by the mount-cycle checking code, thereby allowing
  * cycles to be made.
+ *
+ * locks: mount_locked_reader || namespace_shared && pinned(subtree)
  */
 static bool check_for_nsfs_mounts(struct mount *subtree)
 {
-	struct mount *p;
-	bool ret = false;
-
-	lock_mount_hash();
-	for (p = subtree; p; p = next_mnt(p, subtree))
+	for (struct mount *p = subtree; p; p = next_mnt(p, subtree))
 		if (mnt_ns_loop(p->mnt.mnt_root))
-			goto out;
-
-	ret = true;
-out:
-	unlock_mount_hash();
-	return ret;
+			return false;
+	return true;
 }
 
 /**
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 12/52] propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint()
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (9 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 11/52] check_for_nsfs_mounts(): no need to take locks Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:49     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 13/52] has_locked_children(): use guards Al Viro
                     ` (40 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/pnode.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/pnode.c b/fs/pnode.c
index 6f7d02f3fa98..0702d45d856d 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -304,9 +304,8 @@ int propagate_mnt(struct mount *dest_mnt, struct mountpoint *dest_mp,
 				err = PTR_ERR(this);
 				break;
 			}
-			read_seqlock_excl(&mount_lock);
-			mnt_set_mountpoint(n, dest_mp, this);
-			read_sequnlock_excl(&mount_lock);
+			scoped_guard(mount_locked_reader)
+				mnt_set_mountpoint(n, dest_mp, this);
 			if (n->mnt_master)
 				SET_MNT_MARK(n->mnt_master);
 			copy = this;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 13/52] has_locked_children(): use guards
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (10 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 12/52] propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint() Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 11:54     ` Linus Torvalds
  2025-08-25 12:49     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 14/52] mnt_set_expiry(): " Al Viro
                     ` (39 subsequent siblings)
  51 siblings, 2 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... and document the locking requirements of __has_locked_children()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 59948cbf9c47..eabb0d996c6a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2373,6 +2373,7 @@ void dissolve_on_fput(struct vfsmount *mnt)
 	}
 }
 
+/* locks: namespace_shared && pinned(mnt) || mount_locked_reader */
 static bool __has_locked_children(struct mount *mnt, struct dentry *dentry)
 {
 	struct mount *child;
@@ -2389,12 +2390,8 @@ static bool __has_locked_children(struct mount *mnt, struct dentry *dentry)
 
 bool has_locked_children(struct mount *mnt, struct dentry *dentry)
 {
-	bool res;
-
-	read_seqlock_excl(&mount_lock);
-	res = __has_locked_children(mnt, dentry);
-	read_sequnlock_excl(&mount_lock);
-	return res;
+	scoped_guard(mount_locked_reader)
+		return __has_locked_children(mnt, dentry);
 }
 
 /*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 14/52] mnt_set_expiry(): use guards
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (11 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 13/52] has_locked_children(): use guards Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:51     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 15/52] path_is_under(): " Al Viro
                     ` (38 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

The reason why it needs only mount_locked_reader is that there's no lockless
accesses of expiry lists.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index eabb0d996c6a..acacfe767a7c 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3858,9 +3858,8 @@ int finish_automount(struct vfsmount *m, const struct path *path)
  */
 void mnt_set_expiry(struct vfsmount *mnt, struct list_head *expiry_list)
 {
-	read_seqlock_excl(&mount_lock);
-	list_add_tail(&real_mount(mnt)->mnt_expire, expiry_list);
-	read_sequnlock_excl(&mount_lock);
+	scoped_guard(mount_locked_reader)
+		list_add_tail(&real_mount(mnt)->mnt_expire, expiry_list);
 }
 EXPORT_SYMBOL(mnt_set_expiry);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 15/52] path_is_under(): use guards
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (12 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 14/52] mnt_set_expiry(): " Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:56     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 16/52] current_chrooted(): don't bother with follow_down_one() Al Viro
                     ` (37 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... and document that locking requirements for is_path_reachable().
There is one questionable caller in do_listmount() where we are not
holding mount_lock *and* might not have the first argument mounted.
However, in that case it will immediately return true without having
to look at the ancestors.  Might be cleaner to move the check into
non-LSTM_ROOT case which it really belongs in - there the check is
not always true and is_mounted() is guaranteed.

Document the locking environments for is_path_reachable() callers:
	get_peer_under_root()
	get_dominating_id()
	do_statmount()
	do_listmount()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 12 ++++++------
 fs/pnode.c     |  3 ++-
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index acacfe767a7c..bf9a3a644faa 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4592,7 +4592,7 @@ SYSCALL_DEFINE5(move_mount,
 /*
  * Return true if path is reachable from root
  *
- * namespace_sem or mount_lock is held
+ * locks: mount_locked_reader || namespace_shared && is_mounted(mnt)
  */
 bool is_path_reachable(struct mount *mnt, struct dentry *dentry,
 			 const struct path *root)
@@ -4606,11 +4606,9 @@ bool is_path_reachable(struct mount *mnt, struct dentry *dentry,
 
 bool path_is_under(const struct path *path1, const struct path *path2)
 {
-	bool res;
-	read_seqlock_excl(&mount_lock);
-	res = is_path_reachable(real_mount(path1->mnt), path1->dentry, path2);
-	read_sequnlock_excl(&mount_lock);
-	return res;
+	scoped_guard(mount_locked_reader)
+		return is_path_reachable(real_mount(path1->mnt), path1->dentry,
+					 path2);
 }
 EXPORT_SYMBOL(path_is_under);
 
@@ -5689,6 +5687,7 @@ static int grab_requested_root(struct mnt_namespace *ns, struct path *root)
 			     STATMOUNT_MNT_UIDMAP | \
 			     STATMOUNT_MNT_GIDMAP)
 
+/* locks: namespace_shared */
 static int do_statmount(struct kstatmount *s, u64 mnt_id, u64 mnt_ns_id,
 			struct mnt_namespace *ns)
 {
@@ -5949,6 +5948,7 @@ SYSCALL_DEFINE4(statmount, const struct mnt_id_req __user *, req,
 	return ret;
 }
 
+/* locks: namespace_shared */
 static ssize_t do_listmount(struct mnt_namespace *ns, u64 mnt_parent_id,
 			    u64 last_mnt_id, u64 *mnt_ids, size_t nr_mnt_ids,
 			    bool reverse)
diff --git a/fs/pnode.c b/fs/pnode.c
index 0702d45d856d..edaf9d9d0eaf 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -29,6 +29,7 @@ static inline struct mount *next_slave(struct mount *p)
 	return hlist_entry(p->mnt_slave.next, struct mount, mnt_slave);
 }
 
+/* locks: namespace_shared && is_mounted(mnt) */
 static struct mount *get_peer_under_root(struct mount *mnt,
 					 struct mnt_namespace *ns,
 					 const struct path *root)
@@ -50,7 +51,7 @@ static struct mount *get_peer_under_root(struct mount *mnt,
  * Get ID of closest dominating peer group having a representative
  * under the given root.
  *
- * Caller must hold namespace_sem
+ * locks: namespace_shared
  */
 int get_dominating_id(struct mount *mnt, const struct path *root)
 {
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 16/52] current_chrooted(): don't bother with follow_down_one()
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (13 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 15/52] path_is_under(): " Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:57     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 17/52] current_chrooted(): use guards Al Viro
                     ` (36 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

All we need here is to follow ->overmount on root mount of namespace...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index bf9a3a644faa..107da30b408c 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6195,24 +6195,22 @@ bool our_mnt(struct vfsmount *mnt)
 bool current_chrooted(void)
 {
 	/* Does the current process have a non-standard root */
-	struct path ns_root;
+	struct mount *root = current->nsproxy->mnt_ns->root;
 	struct path fs_root;
 	bool chrooted;
 
+	get_fs_root(current->fs, &fs_root);
+
 	/* Find the namespace root */
-	ns_root.mnt = &current->nsproxy->mnt_ns->root->mnt;
-	ns_root.dentry = ns_root.mnt->mnt_root;
-	path_get(&ns_root);
-	while (d_mountpoint(ns_root.dentry) && follow_down_one(&ns_root))
-		;
+	read_seqlock_excl(&mount_lock);
 
-	get_fs_root(current->fs, &fs_root);
+	while (unlikely(root->overmount))
+		root = root->overmount;
 
-	chrooted = !path_equal(&fs_root, &ns_root);
+	chrooted = fs_root.mnt != &root->mnt || !path_mounted(&fs_root);
 
+	read_sequnlock_excl(&mount_lock);
 	path_put(&fs_root);
-	path_put(&ns_root);
-
 	return chrooted;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 17/52] current_chrooted(): use guards
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (14 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 16/52] current_chrooted(): don't bother with follow_down_one() Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:57     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 18/52] do_move_mount(): trim local variables Al Viro
                     ` (35 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

here a use of __free(path_put) for dropping fs_root is enough to
make guard(mount_locked_reader) fit...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 107da30b408c..a8b586e635d8 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6195,23 +6195,20 @@ bool our_mnt(struct vfsmount *mnt)
 bool current_chrooted(void)
 {
 	/* Does the current process have a non-standard root */
-	struct mount *root = current->nsproxy->mnt_ns->root;
-	struct path fs_root;
-	bool chrooted;
+	struct path fs_root __free(path_put) = {};
+	struct mount *root;
 
 	get_fs_root(current->fs, &fs_root);
 
 	/* Find the namespace root */
-	read_seqlock_excl(&mount_lock);
 
+	guard(mount_locked_reader)();
+
+	root = current->nsproxy->mnt_ns->root;
 	while (unlikely(root->overmount))
 		root = root->overmount;
 
-	chrooted = fs_root.mnt != &root->mnt || !path_mounted(&fs_root);
-
-	read_sequnlock_excl(&mount_lock);
-	path_put(&fs_root);
-	return chrooted;
+	return fs_root.mnt != &root->mnt || !path_mounted(&fs_root);
 }
 
 static bool mnt_already_visible(struct mnt_namespace *ns,
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 18/52] do_move_mount(): trim local variables
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (15 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 17/52] current_chrooted(): use guards Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:57     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 19/52] do_move_mount(): deal with the checks on old_path early Al Viro
                     ` (34 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Both 'parent' and 'ns' are used at most once, no point precalculating those...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a8b586e635d8..1a076aac5d73 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3564,10 +3564,8 @@ static inline bool may_use_mount(struct mount *mnt)
 static int do_move_mount(struct path *old_path,
 			 struct path *new_path, enum mnt_tree_flags_t flags)
 {
-	struct mnt_namespace *ns;
 	struct mount *p;
 	struct mount *old;
-	struct mount *parent;
 	struct pinned_mountpoint mp;
 	int err;
 	bool beneath = flags & MNT_TREE_BENEATH;
@@ -3578,8 +3576,6 @@ static int do_move_mount(struct path *old_path,
 
 	old = real_mount(old_path->mnt);
 	p = real_mount(new_path->mnt);
-	parent = old->mnt_parent;
-	ns = old->mnt_ns;
 
 	err = -EINVAL;
 
@@ -3588,12 +3584,12 @@ static int do_move_mount(struct path *old_path,
 		/* ... it should be detachable from parent */
 		if (!mnt_has_parent(old) || IS_MNT_LOCKED(old))
 			goto out;
+		/* ... which should not be shared */
+		if (IS_MNT_SHARED(old->mnt_parent))
+			goto out;
 		/* ... and the target should be in our namespace */
 		if (!check_mnt(p))
 			goto out;
-		/* parent of the source should not be shared */
-		if (IS_MNT_SHARED(parent))
-			goto out;
 	} else {
 		/*
 		 * otherwise the source must be the root of some anon namespace.
@@ -3605,7 +3601,7 @@ static int do_move_mount(struct path *old_path,
 		 * subsequent checks would've rejected that, but they lose
 		 * some corner cases if we check it early.
 		 */
-		if (ns == p->mnt_ns)
+		if (old->mnt_ns == p->mnt_ns)
 			goto out;
 		/*
 		 * Target should be either in our namespace or in an acceptable
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 19/52] do_move_mount(): deal with the checks on old_path early
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (16 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 18/52] do_move_mount(): trim local variables Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:00     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 20/52] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() Al Viro
                     ` (33 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

1) checking that location we want to move does point to root of some mount
can be done before anything else; that property is not going to change
and having it already verified simplifies the analysis.

2) checking the type agreement between what we are trying to move and what
we are trying to move it onto also belongs in the very beginning -
do_lock_mount() might end up switching new_path to something that overmounts
the original location, but... the same type agreement applies to overmounts,
so we could just as well check against the original location.

3) since we know that old_path->dentry is the root of old_path->mnt, there's
no point bothering with path_is_overmounted() in can_move_mount_beneath();
it's simply a check for the mount we are trying to move having non-NULL
->overmount.  And with that, we can switch can_move_mount_beneath() to
taking old instead of old_path, leaving no uses of old_path past the original
checks.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 1a076aac5d73..42ef0d0c3d40 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3433,7 +3433,7 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
 
 /**
  * can_move_mount_beneath - check that we can mount beneath the top mount
- * @from: mount to mount beneath
+ * @mnt_from: mount we are trying to move
  * @to:   mount under which to mount
  * @mp:   mountpoint of @to
  *
@@ -3443,7 +3443,7 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  *   root or the rootfs of the namespace.
  * - Make sure that the caller can unmount the topmost mount ensuring
  *   that the caller could reveal the underlying mountpoint.
- * - Ensure that nothing has been mounted on top of @from before we
+ * - Ensure that nothing has been mounted on top of @mnt_from before we
  *   grabbed @namespace_sem to avoid creating pointless shadow mounts.
  * - Prevent mounting beneath a mount if the propagation relationship
  *   between the source mount, parent mount, and top mount would lead to
@@ -3452,12 +3452,11 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  * Context: This function expects namespace_lock() to be held.
  * Return: On success 0, and on error a negative error code is returned.
  */
-static int can_move_mount_beneath(const struct path *from,
+static int can_move_mount_beneath(struct mount *mnt_from,
 				  const struct path *to,
 				  const struct mountpoint *mp)
 {
-	struct mount *mnt_from = real_mount(from->mnt),
-		     *mnt_to = real_mount(to->mnt),
+	struct mount *mnt_to = real_mount(to->mnt),
 		     *parent_mnt_to = mnt_to->mnt_parent;
 
 	if (!mnt_has_parent(mnt_to))
@@ -3470,7 +3469,7 @@ static int can_move_mount_beneath(const struct path *from,
 		return -EINVAL;
 
 	/* Avoid creating shadow mounts during mount propagation. */
-	if (path_overmounted(from))
+	if (mnt_from->overmount)
 		return -EINVAL;
 
 	/*
@@ -3565,16 +3564,21 @@ static int do_move_mount(struct path *old_path,
 			 struct path *new_path, enum mnt_tree_flags_t flags)
 {
 	struct mount *p;
-	struct mount *old;
+	struct mount *old = real_mount(old_path->mnt);
 	struct pinned_mountpoint mp;
 	int err;
 	bool beneath = flags & MNT_TREE_BENEATH;
 
+	if (!path_mounted(old_path))
+		return -EINVAL;
+
+	if (d_is_dir(new_path->dentry) != d_is_dir(old_path->dentry))
+		return -EINVAL;
+
 	err = do_lock_mount(new_path, &mp, beneath);
 	if (err)
 		return err;
 
-	old = real_mount(old_path->mnt);
 	p = real_mount(new_path->mnt);
 
 	err = -EINVAL;
@@ -3611,15 +3615,8 @@ static int do_move_mount(struct path *old_path,
 			goto out;
 	}
 
-	if (!path_mounted(old_path))
-		goto out;
-
-	if (d_is_dir(new_path->dentry) !=
-	    d_is_dir(old_path->dentry))
-		goto out;
-
 	if (beneath) {
-		err = can_move_mount_beneath(old_path, new_path, mp.mp);
+		err = can_move_mount_beneath(old, new_path, mp.mp);
 		if (err)
 			goto out;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 20/52] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount()
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (17 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 19/52] do_move_mount(): deal with the checks on old_path early Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 12:10     ` Linus Torvalds
  2025-08-25 13:02     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 21/52] finish_automount(): simplify the ELOOP check Al Viro
                     ` (32 subsequent siblings)
  51 siblings, 2 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

We want to mount beneath the given location.  For that operation to
make sense, location must be the root of some mount that has something
under it.  Currently we let it proceed if those requirements are not met,
with rather meaningless results, and have that bogosity caught further
down the road; let's fail early instead - do_lock_mount() doesn't make
sense unless those conditions hold, and checking them there makes
things simpler.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 42ef0d0c3d40..9e04133d81dd 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2768,12 +2768,19 @@ static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bo
 	struct path under = {};
 	int err = -ENOENT;
 
+	if (unlikely(beneath) && !path_mounted(path))
+		return -EINVAL;
+
 	for (;;) {
 		struct mount *m = real_mount(mnt);
 
 		if (beneath) {
 			path_put(&under);
 			read_seqlock_excl(&mount_lock);
+			if (unlikely(!mnt_has_parent(m))) {
+				read_sequnlock_excl(&mount_lock);
+				return -EINVAL;
+			}
 			under.mnt = mntget(&m->mnt_parent->mnt);
 			under.dentry = dget(m->mnt_mountpoint);
 			read_sequnlock_excl(&mount_lock);
@@ -3437,8 +3444,6 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  * @to:   mount under which to mount
  * @mp:   mountpoint of @to
  *
- * - Make sure that @to->dentry is actually the root of a mount under
- *   which we can mount another mount.
  * - Make sure that nothing can be mounted beneath the caller's current
  *   root or the rootfs of the namespace.
  * - Make sure that the caller can unmount the topmost mount ensuring
@@ -3459,12 +3464,6 @@ static int can_move_mount_beneath(struct mount *mnt_from,
 	struct mount *mnt_to = real_mount(to->mnt),
 		     *parent_mnt_to = mnt_to->mnt_parent;
 
-	if (!mnt_has_parent(mnt_to))
-		return -EINVAL;
-
-	if (!path_mounted(to))
-		return -EINVAL;
-
 	if (IS_MNT_LOCKED(mnt_to))
 		return -EINVAL;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 21/52] finish_automount(): simplify the ELOOP check
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (18 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 20/52] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:02     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 22/52] do_loopback(): use __free(path_put) to deal with old_path Al Viro
                     ` (31 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

It's enough to check that dentries match; if path->dentry is equal to
m->mnt_root, superblocks will match as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 9e04133d81dd..5c4b4f25b5f8 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3803,8 +3803,7 @@ int finish_automount(struct vfsmount *m, const struct path *path)
 
 	mnt = real_mount(m);
 
-	if (m->mnt_sb == path->mnt->mnt_sb &&
-	    m->mnt_root == dentry) {
+	if (m->mnt_root == path->dentry) {
 		err = -ELOOP;
 		goto discard;
 	}
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 22/52] do_loopback(): use __free(path_put) to deal with old_path
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (19 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 21/52] finish_automount(): simplify the ELOOP check Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:02     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 23/52] pivot_root(2): use __free() to deal with struct path in it Al Viro
                     ` (30 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

preparations for making unlock_mount() a __cleanup();
can't have path_put() inside mount_lock scope.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5c4b4f25b5f8..602612cbd095 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3014,7 +3014,7 @@ static struct mount *__do_loopback(struct path *old_path, int recurse)
 static int do_loopback(struct path *path, const char *old_name,
 				int recurse)
 {
-	struct path old_path;
+	struct path old_path __free(path_put) = {};
 	struct mount *mnt = NULL, *parent;
 	struct pinned_mountpoint mp = {};
 	int err;
@@ -3024,13 +3024,12 @@ static int do_loopback(struct path *path, const char *old_name,
 	if (err)
 		return err;
 
-	err = -EINVAL;
 	if (mnt_ns_loop(old_path.dentry))
-		goto out;
+		return -EINVAL;
 
 	err = lock_mount(path, &mp);
 	if (err)
-		goto out;
+		return err;
 
 	parent = real_mount(path->mnt);
 	if (!check_mnt(parent))
@@ -3050,8 +3049,6 @@ static int do_loopback(struct path *path, const char *old_name,
 	}
 out2:
 	unlock_mount(&mp);
-out:
-	path_put(&old_path);
 	return err;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 23/52] pivot_root(2): use __free() to deal with struct path in it
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (20 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 22/52] do_loopback(): use __free(path_put) to deal with old_path Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:03     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 24/52] finish_automount(): take the lock_mount() analogue into a helper Al Viro
                     ` (29 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

preparations for making unlock_mount() a __cleanup();
can't have path_put() inside mount_lock scope.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 602612cbd095..892251663419 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4628,7 +4628,9 @@ EXPORT_SYMBOL(path_is_under);
 SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 		const char __user *, put_old)
 {
-	struct path new, old, root;
+	struct path new __free(path_put) = {};
+	struct path old __free(path_put) = {};
+	struct path root __free(path_put) = {};
 	struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent, *ex_parent;
 	struct pinned_mountpoint old_mp = {};
 	int error;
@@ -4639,21 +4641,21 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	error = user_path_at(AT_FDCWD, new_root,
 			     LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &new);
 	if (error)
-		goto out0;
+		return error;
 
 	error = user_path_at(AT_FDCWD, put_old,
 			     LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &old);
 	if (error)
-		goto out1;
+		return error;
 
 	error = security_sb_pivotroot(&old, &new);
 	if (error)
-		goto out2;
+		return error;
 
 	get_fs_root(current->fs, &root);
 	error = lock_mount(&old, &old_mp);
 	if (error)
-		goto out3;
+		return error;
 
 	error = -EINVAL;
 	new_mnt = real_mount(new.mnt);
@@ -4711,13 +4713,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	error = 0;
 out4:
 	unlock_mount(&old_mp);
-out3:
-	path_put(&root);
-out2:
-	path_put(&old);
-out1:
-	path_put(&new);
-out0:
 	return error;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 24/52] finish_automount(): take the lock_mount() analogue into a helper
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (21 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 23/52] pivot_root(2): use __free() to deal with struct path in it Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:08     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 25/52] do_new_mount_rc(): use __free() to deal with dropping mnt on failure Al Viro
                     ` (28 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

finish_automount() can't use lock_mount() - it treats finding something
already mounted as "quitely drop our mount and return 0", not as
"mount on top of whatever mounted there".  It's been open-coded;
let's take it into a helper similar to lock_mount().  "something's
already mounted" => -EBUSY, finish_automount() needs to distinguish
it from the normal case and it can't happen in other failure cases.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 892251663419..99757040a39a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3786,9 +3786,29 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 	return err;
 }
 
-int finish_automount(struct vfsmount *m, const struct path *path)
+static int lock_mount_exact(const struct path *path,
+			    struct pinned_mountpoint *mp)
 {
 	struct dentry *dentry = path->dentry;
+	int err;
+
+	inode_lock(dentry->d_inode);
+	namespace_lock();
+	if (unlikely(cant_mount(dentry)))
+		err = -ENOENT;
+	else if (path_overmounted(path))
+		err = -EBUSY;
+	else
+		err = get_mountpoint(dentry, mp);
+	if (unlikely(err)) {
+		namespace_unlock();
+		inode_unlock(dentry->d_inode);
+	}
+	return err;
+}
+
+int finish_automount(struct vfsmount *m, const struct path *path)
+{
 	struct pinned_mountpoint mp = {};
 	struct mount *mnt;
 	int err;
@@ -3810,20 +3830,11 @@ int finish_automount(struct vfsmount *m, const struct path *path)
 	 * that overmounts our mountpoint to be means "quitely drop what we've
 	 * got", not "try to mount it on top".
 	 */
-	inode_lock(dentry->d_inode);
-	namespace_lock();
-	if (unlikely(cant_mount(dentry))) {
-		err = -ENOENT;
-		goto discard_locked;
-	}
-	if (path_overmounted(path)) {
-		err = 0;
-		goto discard_locked;
+	err = lock_mount_exact(path, &mp);
+	if (unlikely(err)) {
+		mntput(m);
+		return err == -EBUSY ? 0 : err;
 	}
-	err = get_mountpoint(dentry, &mp);
-	if (err)
-		goto discard_locked;
-
 	err = do_add_mount(mnt, mp.mp, path,
 			   path->mnt->mnt_flags | MNT_SHRINKABLE);
 	unlock_mount(&mp);
@@ -3831,9 +3842,6 @@ int finish_automount(struct vfsmount *m, const struct path *path)
 		goto discard;
 	return 0;
 
-discard_locked:
-	namespace_unlock();
-	inode_unlock(dentry->d_inode);
 discard:
 	mntput(m);
 	return err;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 25/52] do_new_mount_rc(): use __free() to deal with dropping mnt on failure
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (22 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 24/52] finish_automount(): take the lock_mount() analogue into a helper Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:29     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 26/52] finish_automount(): use __free() to deal with dropping mnt on failure Al Viro
                     ` (27 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

do_add_mount() consumes vfsmount on success; just follow it with
conditional retain_and_null_ptr() on success and we can switch
to __free() for mnt and be done with that - unlock_mount() is
in the very end.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 99757040a39a..79c87937a7dd 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3694,7 +3694,6 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
 static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 			   unsigned int mnt_flags)
 {
-	struct vfsmount *mnt;
 	struct pinned_mountpoint mp = {};
 	struct super_block *sb = fc->root->d_sb;
 	int error;
@@ -3710,7 +3709,7 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 
 	up_write(&sb->s_umount);
 
-	mnt = vfs_create_mount(fc);
+	struct vfsmount *mnt __free(mntput) = vfs_create_mount(fc);
 	if (IS_ERR(mnt))
 		return PTR_ERR(mnt);
 
@@ -3720,10 +3719,10 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 	if (!error) {
 		error = do_add_mount(real_mount(mnt), mp.mp,
 				     mountpoint, mnt_flags);
+		if (!error)
+			retain_and_null_ptr(mnt); // consumed on success
 		unlock_mount(&mp);
 	}
-	if (error < 0)
-		mntput(mnt);
 	return error;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 26/52] finish_automount(): use __free() to deal with dropping mnt on failure
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (23 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 25/52] do_new_mount_rc(): use __free() to deal with dropping mnt on failure Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:09     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 27/52] change calling conventions for lock_mount() et.al Al Viro
                     ` (26 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

same story as with do_new_mount_fc().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 79c87937a7dd..5819a50d7d67 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3806,8 +3806,9 @@ static int lock_mount_exact(const struct path *path,
 	return err;
 }
 
-int finish_automount(struct vfsmount *m, const struct path *path)
+int finish_automount(struct vfsmount *__m, const struct path *path)
 {
+	struct vfsmount *m __free(mntput) = __m;
 	struct pinned_mountpoint mp = {};
 	struct mount *mnt;
 	int err;
@@ -3819,10 +3820,8 @@ int finish_automount(struct vfsmount *m, const struct path *path)
 
 	mnt = real_mount(m);
 
-	if (m->mnt_root == path->dentry) {
-		err = -ELOOP;
-		goto discard;
-	}
+	if (m->mnt_root == path->dentry)
+		return -ELOOP;
 
 	/*
 	 * we don't want to use lock_mount() - in this case finding something
@@ -3830,19 +3829,14 @@ int finish_automount(struct vfsmount *m, const struct path *path)
 	 * got", not "try to mount it on top".
 	 */
 	err = lock_mount_exact(path, &mp);
-	if (unlikely(err)) {
-		mntput(m);
+	if (unlikely(err))
 		return err == -EBUSY ? 0 : err;
-	}
+
 	err = do_add_mount(mnt, mp.mp, path,
 			   path->mnt->mnt_flags | MNT_SHRINKABLE);
+	if (likely(!err))
+		retain_and_null_ptr(m);
 	unlock_mount(&mp);
-	if (unlikely(err))
-		goto discard;
-	return 0;
-
-discard:
-	mntput(m);
 	return err;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 27/52] change calling conventions for lock_mount() et.al.
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (24 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 26/52] finish_automount(): use __free() to deal with dropping mnt on failure Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25  4:43   ` [PATCH 28/52] do_move_mount(): use the parent mount returned by do_lock_mount() Al Viro
                     ` (25 subsequent siblings)
  51 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

1) pinned_mountpoint gets a new member - struct mount *parent.
Set only if we locked the sucker; ERR_PTR() - on failed attempt.

2) do_lock_mount() et.al. return void and set ->parent to
	* on success with !beneath - mount corresponding to path->mnt
	* on success with beneath - the parent of mount corresponding
to path->mnt
	* in case of error - ERR_PTR(-E...).
IOW, we get the mount we will be actually mounting upon or ERR_PTR().

3) we can't use CLASS, since the pinned_mountpoint is placed on
hlist during initialization, so we define local macros:
	LOCK_MOUNT(mp, path)
	LOCK_MOUNT_MAYBE_BENEATH(mp, path, beneath)
	LOCK_MOUNT_EXACT(mp, path)
All of them declare and initialize struct pinned_mountpoint mp,
with unlock_mount done via __cleanup().

Users converted.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 219 ++++++++++++++++++++++++-------------------------
 1 file changed, 108 insertions(+), 111 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5819a50d7d67..8d6e26e2c97a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -919,6 +919,7 @@ bool __is_local_mountpoint(const struct dentry *dentry)
 struct pinned_mountpoint {
 	struct hlist_node node;
 	struct mountpoint *mp;
+	struct mount *parent;
 };
 
 static bool lookup_mountpoint(struct dentry *dentry, struct pinned_mountpoint *m)
@@ -2728,48 +2729,47 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 }
 
 /**
- * do_lock_mount - lock mount and mountpoint
- * @path:    target path
- * @beneath: whether the intention is to mount beneath @path
+ * do_lock_mount - acquire environment for mounting
+ * @path:	target path
+ * @res:	context to set up
+ * @beneath:	whether the intention is to mount beneath @path
  *
- * Follow the mount stack on @path until the top mount @mnt is found. If
- * the initial @path->{mnt,dentry} is a mountpoint lookup the first
- * mount stacked on top of it. Then simply follow @{mnt,mnt->mnt_root}
- * until nothing is stacked on top of it anymore.
+ * To mount something at given location, we need
+ *	namespace_sem locked exclusive
+ *	inode of dentry we are mounting on locked exclusive
+ *	struct mountpoint for that dentry
+ *	struct mount we are mounting on
  *
- * Acquire the inode_lock() on the top mount's ->mnt_root to protect
- * against concurrent removal of the new mountpoint from another mount
- * namespace.
+ * Results are stored in caller-supplied context (pinned_mountpoint);
+ * on success we have res->parent and res->mp pointing to parent and
+ * mountpoint respectively and res->node inserted into the ->m_list
+ * of the mountpoint, making sure the mountpoint won't disappear.
+ * On failure we have res->parent set to ERR_PTR(-E...), res->mp
+ * left NULL, res->node - empty.
+ * In case of success do_lock_mount returns with locks acquired (in
+ * proper order - inode lock nests outside of namespace_sem).
  *
- * If @beneath is requested, acquire inode_lock() on @mnt's mountpoint
- * @mp on @mnt->mnt_parent must be acquired. This protects against a
- * concurrent unlink of @mp->mnt_dentry from another mount namespace
- * where @mnt doesn't have a child mount mounted @mp. A concurrent
- * removal of @mnt->mnt_root doesn't matter as nothing will be mounted
- * on top of it for @beneath.
+ * Request to mount on overmounted location is treated as "mount on
+ * top of whatever's overmounting it"; request to mount beneath
+ * a location - "mount immediately beneath the topmost mount at that
+ * place".
  *
- * In addition, @beneath needs to make sure that @mnt hasn't been
- * unmounted or moved from its current mountpoint in between dropping
- * @mount_lock and acquiring @namespace_sem. For the !@beneath case @mnt
- * being unmounted would be detected later by e.g., calling
- * check_mnt(mnt) in the function it's called from. For the @beneath
- * case however, it's useful to detect it directly in do_lock_mount().
- * If @mnt hasn't been unmounted then @mnt->mnt_mountpoint still points
- * to @mnt->mnt_mp->m_dentry. But if @mnt has been unmounted it will
- * point to @mnt->mnt_root and @mnt->mnt_mp will be NULL.
- *
- * Return: Either the target mountpoint on the top mount or the top
- *         mount's mountpoint.
+ * In all cases the location must not have been unmounted and the
+ * chosen mountpoint must be allowed to be mounted on.  For "beneath"
+ * case we also require the location to be at the root of a mount
+ * that has a parent (i.e. is not a root of some namespace).
  */
-static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bool beneath)
+static void do_lock_mount(struct path *path, struct pinned_mountpoint *res, bool beneath)
 {
 	struct vfsmount *mnt = path->mnt;
 	struct dentry *dentry;
 	struct path under = {};
 	int err = -ENOENT;
 
-	if (unlikely(beneath) && !path_mounted(path))
-		return -EINVAL;
+	if (unlikely(beneath) && !path_mounted(path)) {
+		res->parent = ERR_PTR(-EINVAL);
+		return;
+	}
 
 	for (;;) {
 		struct mount *m = real_mount(mnt);
@@ -2779,7 +2779,8 @@ static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bo
 			read_seqlock_excl(&mount_lock);
 			if (unlikely(!mnt_has_parent(m))) {
 				read_sequnlock_excl(&mount_lock);
-				return -EINVAL;
+				res->parent = ERR_PTR(-EINVAL);
+				return;
 			}
 			under.mnt = mntget(&m->mnt_parent->mnt);
 			under.dentry = dget(m->mnt_mountpoint);
@@ -2811,7 +2812,7 @@ static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bo
 			path->dentry = dget(mnt->mnt_root);
 			continue;	// got overmounted
 		}
-		err = get_mountpoint(dentry, pinned);
+		err = get_mountpoint(dentry, res);
 		if (err)
 			break;
 		if (beneath) {
@@ -2822,22 +2823,25 @@ static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bo
 			 * we are not dropping the final references here).
 			 */
 			path_put(&under);
+			res->parent = real_mount(path->mnt)->mnt_parent;
+			return;
 		}
-		return 0;
+		res->parent = real_mount(path->mnt);
+		return;
 	}
 	namespace_unlock();
 	inode_unlock(dentry->d_inode);
 	if (beneath)
 		path_put(&under);
-	return err;
+	res->parent = ERR_PTR(err);
 }
 
-static inline int lock_mount(struct path *path, struct pinned_mountpoint *m)
+static inline void lock_mount(struct path *path, struct pinned_mountpoint *m)
 {
-	return do_lock_mount(path, m, false);
+	do_lock_mount(path, m, false);
 }
 
-static void unlock_mount(struct pinned_mountpoint *m)
+static void __unlock_mount(struct pinned_mountpoint *m)
 {
 	inode_unlock(m->mp->m_dentry->d_inode);
 	read_seqlock_excl(&mount_lock);
@@ -2846,6 +2850,20 @@ static void unlock_mount(struct pinned_mountpoint *m)
 	namespace_unlock();
 }
 
+static inline void unlock_mount(struct pinned_mountpoint *m)
+{
+	if (!IS_ERR(m->parent))
+		__unlock_mount(m);
+}
+
+#define LOCK_MOUNT_MAYBE_BENEATH(mp, path, beneath) \
+	struct pinned_mountpoint mp __cleanup(unlock_mount) = {}; \
+	do_lock_mount((path), &mp, (beneath))
+#define LOCK_MOUNT(mp, path) LOCK_MOUNT_MAYBE_BENEATH(mp, (path), false)
+#define LOCK_MOUNT_EXACT(mp, path) \
+	struct pinned_mountpoint mp __cleanup(unlock_mount) = {}; \
+	lock_mount_exact((path), &mp)
+
 static int graft_tree(struct mount *mnt, struct mount *p, struct mountpoint *mp)
 {
 	if (mnt->mnt.mnt_sb->s_flags & SB_NOUSER)
@@ -3015,8 +3033,7 @@ static int do_loopback(struct path *path, const char *old_name,
 				int recurse)
 {
 	struct path old_path __free(path_put) = {};
-	struct mount *mnt = NULL, *parent;
-	struct pinned_mountpoint mp = {};
+	struct mount *mnt = NULL;
 	int err;
 	if (!old_name || !*old_name)
 		return -EINVAL;
@@ -3027,28 +3044,23 @@ static int do_loopback(struct path *path, const char *old_name,
 	if (mnt_ns_loop(old_path.dentry))
 		return -EINVAL;
 
-	err = lock_mount(path, &mp);
-	if (err)
-		return err;
+	LOCK_MOUNT(mp, path);
+	if (IS_ERR(mp.parent))
+		return PTR_ERR(mp.parent);
 
-	parent = real_mount(path->mnt);
-	if (!check_mnt(parent))
-		goto out2;
+	if (!check_mnt(mp.parent))
+		return -EINVAL;
 
 	mnt = __do_loopback(&old_path, recurse);
-	if (IS_ERR(mnt)) {
-		err = PTR_ERR(mnt);
-		goto out2;
-	}
+	if (IS_ERR(mnt))
+		return PTR_ERR(mnt);
 
-	err = graft_tree(mnt, parent, mp.mp);
+	err = graft_tree(mnt, mp.parent, mp.mp);
 	if (err) {
 		lock_mount_hash();
 		umount_tree(mnt, UMOUNT_SYNC);
 		unlock_mount_hash();
 	}
-out2:
-	unlock_mount(&mp);
 	return err;
 }
 
@@ -3561,7 +3573,6 @@ static int do_move_mount(struct path *old_path,
 {
 	struct mount *p;
 	struct mount *old = real_mount(old_path->mnt);
-	struct pinned_mountpoint mp;
 	int err;
 	bool beneath = flags & MNT_TREE_BENEATH;
 
@@ -3571,52 +3582,49 @@ static int do_move_mount(struct path *old_path,
 	if (d_is_dir(new_path->dentry) != d_is_dir(old_path->dentry))
 		return -EINVAL;
 
-	err = do_lock_mount(new_path, &mp, beneath);
-	if (err)
-		return err;
+	LOCK_MOUNT_MAYBE_BENEATH(mp, new_path, beneath);
+	if (IS_ERR(mp.parent))
+		return PTR_ERR(mp.parent);
 
 	p = real_mount(new_path->mnt);
 
-	err = -EINVAL;
-
 	if (check_mnt(old)) {
 		/* if the source is in our namespace... */
 		/* ... it should be detachable from parent */
 		if (!mnt_has_parent(old) || IS_MNT_LOCKED(old))
-			goto out;
+			return -EINVAL;
 		/* ... which should not be shared */
 		if (IS_MNT_SHARED(old->mnt_parent))
-			goto out;
+			return -EINVAL;
 		/* ... and the target should be in our namespace */
 		if (!check_mnt(p))
-			goto out;
+			return -EINVAL;
 	} else {
 		/*
 		 * otherwise the source must be the root of some anon namespace.
 		 */
 		if (!anon_ns_root(old))
-			goto out;
+			return -EINVAL;
 		/*
 		 * Bail out early if the target is within the same namespace -
 		 * subsequent checks would've rejected that, but they lose
 		 * some corner cases if we check it early.
 		 */
 		if (old->mnt_ns == p->mnt_ns)
-			goto out;
+			return -EINVAL;
 		/*
 		 * Target should be either in our namespace or in an acceptable
 		 * anon namespace, sensu check_anonymous_mnt().
 		 */
 		if (!may_use_mount(p))
-			goto out;
+			return -EINVAL;
 	}
 
 	if (beneath) {
 		err = can_move_mount_beneath(old, new_path, mp.mp);
 		if (err)
-			goto out;
+			return err;
 
-		err = -EINVAL;
 		p = p->mnt_parent;
 	}
 
@@ -3625,17 +3633,13 @@ static int do_move_mount(struct path *old_path,
 	 * mount which is shared.
 	 */
 	if (IS_MNT_SHARED(p) && tree_contains_unbindable(old))
-		goto out;
-	err = -ELOOP;
+		return -EINVAL;
 	if (!check_for_nsfs_mounts(old))
-		goto out;
+		return -ELOOP;
 	if (mount_is_ancestor(old, p))
-		goto out;
+		return -ELOOP;
 
-	err = attach_recursive_mnt(old, p, mp.mp);
-out:
-	unlock_mount(&mp);
-	return err;
+	return attach_recursive_mnt(old, p, mp.mp);
 }
 
 static int do_move_mount_old(struct path *path, const char *old_name)
@@ -3694,7 +3698,6 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
 static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 			   unsigned int mnt_flags)
 {
-	struct pinned_mountpoint mp = {};
 	struct super_block *sb = fc->root->d_sb;
 	int error;
 
@@ -3715,13 +3718,14 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
-	error = lock_mount(mountpoint, &mp);
-	if (!error) {
+	LOCK_MOUNT(mp, mountpoint);
+	if (IS_ERR(mp.parent)) {
+		return PTR_ERR(mp.parent);
+	} else {
 		error = do_add_mount(real_mount(mnt), mp.mp,
 				     mountpoint, mnt_flags);
 		if (!error)
 			retain_and_null_ptr(mnt); // consumed on success
-		unlock_mount(&mp);
 	}
 	return error;
 }
@@ -3785,8 +3789,8 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 	return err;
 }
 
-static int lock_mount_exact(const struct path *path,
-			    struct pinned_mountpoint *mp)
+static void lock_mount_exact(const struct path *path,
+			     struct pinned_mountpoint *mp)
 {
 	struct dentry *dentry = path->dentry;
 	int err;
@@ -3802,14 +3806,15 @@ static int lock_mount_exact(const struct path *path,
 	if (unlikely(err)) {
 		namespace_unlock();
 		inode_unlock(dentry->d_inode);
+		mp->parent = ERR_PTR(err);
+	} else {
+		mp->parent = real_mount(path->mnt);
 	}
-	return err;
 }
 
 int finish_automount(struct vfsmount *__m, const struct path *path)
 {
 	struct vfsmount *m __free(mntput) = __m;
-	struct pinned_mountpoint mp = {};
 	struct mount *mnt;
 	int err;
 
@@ -3828,15 +3833,14 @@ int finish_automount(struct vfsmount *__m, const struct path *path)
 	 * that overmounts our mountpoint to be means "quitely drop what we've
 	 * got", not "try to mount it on top".
 	 */
-	err = lock_mount_exact(path, &mp);
-	if (unlikely(err))
-		return err == -EBUSY ? 0 : err;
+	LOCK_MOUNT_EXACT(mp, path);
+	if (IS_ERR(mp.parent))
+		return mp.parent == ERR_PTR(-EBUSY) ? 0 : PTR_ERR(mp.parent);
 
 	err = do_add_mount(mnt, mp.mp, path,
 			   path->mnt->mnt_flags | MNT_SHRINKABLE);
 	if (likely(!err))
 		retain_and_null_ptr(m);
-	unlock_mount(&mp);
 	return err;
 }
 
@@ -4633,7 +4637,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	struct path old __free(path_put) = {};
 	struct path root __free(path_put) = {};
 	struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent, *ex_parent;
-	struct pinned_mountpoint old_mp = {};
 	int error;
 
 	if (!may_mount())
@@ -4654,45 +4657,42 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 		return error;
 
 	get_fs_root(current->fs, &root);
-	error = lock_mount(&old, &old_mp);
-	if (error)
-		return error;
 
-	error = -EINVAL;
+	LOCK_MOUNT(old_mp, &old);
+	old_mnt = old_mp.parent;
+	if (IS_ERR(old_mnt))
+		return PTR_ERR(old_mnt);
+
 	new_mnt = real_mount(new.mnt);
 	root_mnt = real_mount(root.mnt);
-	old_mnt = real_mount(old.mnt);
 	ex_parent = new_mnt->mnt_parent;
 	root_parent = root_mnt->mnt_parent;
 	if (IS_MNT_SHARED(old_mnt) ||
 		IS_MNT_SHARED(ex_parent) ||
 		IS_MNT_SHARED(root_parent))
-		goto out4;
+		return -EINVAL;
 	if (!check_mnt(root_mnt) || !check_mnt(new_mnt))
-		goto out4;
+		return -EINVAL;
 	if (new_mnt->mnt.mnt_flags & MNT_LOCKED)
-		goto out4;
-	error = -ENOENT;
+		return -EINVAL;
 	if (d_unlinked(new.dentry))
-		goto out4;
-	error = -EBUSY;
+		return -ENOENT;
 	if (new_mnt == root_mnt || old_mnt == root_mnt)
-		goto out4; /* loop, on the same file system  */
-	error = -EINVAL;
+		return -EBUSY; /* loop, on the same file system  */
 	if (!path_mounted(&root))
-		goto out4; /* not a mountpoint */
+		return -EINVAL; /* not a mountpoint */
 	if (!mnt_has_parent(root_mnt))
-		goto out4; /* absolute root */
+		return -EINVAL; /* absolute root */
 	if (!path_mounted(&new))
-		goto out4; /* not a mountpoint */
+		return -EINVAL; /* not a mountpoint */
 	if (!mnt_has_parent(new_mnt))
-		goto out4; /* absolute root */
+		return -EINVAL; /* absolute root */
 	/* make sure we can reach put_old from new_root */
 	if (!is_path_reachable(old_mnt, old.dentry, &new))
-		goto out4;
+		return -EINVAL;
 	/* make certain new is below the root */
 	if (!is_path_reachable(new_mnt, new.dentry, &root))
-		goto out4;
+		return -EINVAL;
 	lock_mount_hash();
 	umount_mnt(new_mnt);
 	if (root_mnt->mnt.mnt_flags & MNT_LOCKED) {
@@ -4711,10 +4711,7 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	mnt_notify_add(root_mnt);
 	mnt_notify_add(new_mnt);
 	chroot_fs_refs(&root, &new);
-	error = 0;
-out4:
-	unlock_mount(&old_mp);
-	return error;
+	return 0;
 }
 
 static unsigned int recalc_flags(struct mount_kattr *kattr, struct mount *mnt)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 28/52] do_move_mount(): use the parent mount returned by do_lock_mount()
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (25 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 27/52] change calling conventions for lock_mount() et.al Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25  4:43   ` [PATCH 29/52] do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path Al Viro
                     ` (24 subsequent siblings)
  51 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

After successful do_lock_mount() call, mp.parent is set to either
real_mount(path->mnt) (for !beneath case) or to ->mnt_parent of that
(for beneath).  p is set to real_mount(path->mnt) and after
several uses it's made equal to mp.parent.  All uses prior to that
care only about p->mnt_ns and since p->mnt_ns == parent->mnt_ns,
we might as well use mp.parent all along.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 8d6e26e2c97a..05019dde25a0 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3571,7 +3571,6 @@ static inline bool may_use_mount(struct mount *mnt)
 static int do_move_mount(struct path *old_path,
 			 struct path *new_path, enum mnt_tree_flags_t flags)
 {
-	struct mount *p;
 	struct mount *old = real_mount(old_path->mnt);
 	int err;
 	bool beneath = flags & MNT_TREE_BENEATH;
@@ -3586,8 +3585,6 @@ static int do_move_mount(struct path *old_path,
 	if (IS_ERR(mp.parent))
 		return PTR_ERR(mp.parent);
 
-	p = real_mount(new_path->mnt);
-
 	if (check_mnt(old)) {
 		/* if the source is in our namespace... */
 		/* ... it should be detachable from parent */
@@ -3597,7 +3594,7 @@ static int do_move_mount(struct path *old_path,
 		if (IS_MNT_SHARED(old->mnt_parent))
 			return -EINVAL;
 		/* ... and the target should be in our namespace */
-		if (!check_mnt(p))
+		if (!check_mnt(mp.parent))
 			return -EINVAL;
 	} else {
 		/*
@@ -3610,13 +3607,13 @@ static int do_move_mount(struct path *old_path,
 		 * subsequent checks would've rejected that, but they lose
 		 * some corner cases if we check it early.
 		 */
-		if (old->mnt_ns == p->mnt_ns)
+		if (old->mnt_ns == mp.parent->mnt_ns)
 			return -EINVAL;
 		/*
 		 * Target should be either in our namespace or in an acceptable
 		 * anon namespace, sensu check_anonymous_mnt().
 		 */
-		if (!may_use_mount(p))
+		if (!may_use_mount(mp.parent))
 			return -EINVAL;
 	}
 
@@ -3624,22 +3621,20 @@ static int do_move_mount(struct path *old_path,
 		err = can_move_mount_beneath(old, new_path, mp.mp);
 		if (err)
 			return err;
-
-		p = p->mnt_parent;
 	}
 
 	/*
 	 * Don't move a mount tree containing unbindable mounts to a destination
 	 * mount which is shared.
 	 */
-	if (IS_MNT_SHARED(p) && tree_contains_unbindable(old))
+	if (IS_MNT_SHARED(mp.parent) && tree_contains_unbindable(old))
 		return -EINVAL;
 	if (!check_for_nsfs_mounts(old))
 		return -ELOOP;
-	if (mount_is_ancestor(old, p))
+	if (mount_is_ancestor(old, mp.parent))
 		return -ELOOP;
 
-	return attach_recursive_mnt(old, p, mp.mp);
+	return attach_recursive_mnt(old, mp.parent, mp.mp);
 }
 
 static int do_move_mount_old(struct path *path, const char *old_name)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 29/52] do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (26 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 28/52] do_move_mount(): use the parent mount returned by do_lock_mount() Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25  4:43   ` [PATCH 30/52] graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint Al Viro
                     ` (23 subsequent siblings)
  51 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Both callers pass it a mountpoint reference picked from pinned_mountpoint
and path it corresponds to.

First of all, path->dentry is equal to mp.mp->m_dentry.  Furthermore, path->mnt
is &mp.parent->mnt, making struct path contents redundant.

Pass it the address of that pinned_mountpoint instead; what's more, if we
teach it to treat ERR_PTR(error) in ->parent as "bail out with that error"
we can simplify the callers even more - do_add_mount() will do the right
thing even when called after lock_mount() failure.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 05019dde25a0..06c672127aee 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3657,10 +3657,13 @@ static int do_move_mount_old(struct path *path, const char *old_name)
 /*
  * add a mount into a namespace's mount tree
  */
-static int do_add_mount(struct mount *newmnt, struct mountpoint *mp,
-			const struct path *path, int mnt_flags)
+static int do_add_mount(struct mount *newmnt, const struct pinned_mountpoint *mp,
+			int mnt_flags)
 {
-	struct mount *parent = real_mount(path->mnt);
+	struct mount *parent = mp->parent;
+
+	if (IS_ERR(parent))
+		return PTR_ERR(parent);
 
 	mnt_flags &= ~MNT_INTERNAL_FLAGS;
 
@@ -3674,14 +3677,15 @@ static int do_add_mount(struct mount *newmnt, struct mountpoint *mp,
 	}
 
 	/* Refuse the same filesystem on the same mount point */
-	if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb && path_mounted(path))
+	if (parent->mnt.mnt_sb == newmnt->mnt.mnt_sb &&
+	    parent->mnt.mnt_root == mp->mp->m_dentry)
 		return -EBUSY;
 
 	if (d_is_symlink(newmnt->mnt.mnt_root))
 		return -EINVAL;
 
 	newmnt->mnt.mnt_flags = mnt_flags;
-	return graft_tree(newmnt, parent, mp);
+	return graft_tree(newmnt, parent, mp->mp);
 }
 
 static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags);
@@ -3714,14 +3718,9 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
 	LOCK_MOUNT(mp, mountpoint);
-	if (IS_ERR(mp.parent)) {
-		return PTR_ERR(mp.parent);
-	} else {
-		error = do_add_mount(real_mount(mnt), mp.mp,
-				     mountpoint, mnt_flags);
-		if (!error)
-			retain_and_null_ptr(mnt); // consumed on success
-	}
+	error = do_add_mount(real_mount(mnt), &mp, mnt_flags);
+	if (!error)
+		retain_and_null_ptr(mnt); // consumed on success
 	return error;
 }
 
@@ -3829,11 +3828,10 @@ int finish_automount(struct vfsmount *__m, const struct path *path)
 	 * got", not "try to mount it on top".
 	 */
 	LOCK_MOUNT_EXACT(mp, path);
-	if (IS_ERR(mp.parent))
-		return mp.parent == ERR_PTR(-EBUSY) ? 0 : PTR_ERR(mp.parent);
+	if (mp.parent == ERR_PTR(-EBUSY))
+		return 0;
 
-	err = do_add_mount(mnt, mp.mp, path,
-			   path->mnt->mnt_flags | MNT_SHRINKABLE);
+	err = do_add_mount(mnt, &mp, path->mnt->mnt_flags | MNT_SHRINKABLE);
 	if (likely(!err))
 		retain_and_null_ptr(m);
 	return err;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 30/52] graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (27 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 29/52] do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25  4:43   ` [PATCH 31/52] pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry Al Viro
                     ` (22 subsequent siblings)
  51 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

parent and mountpoint always come from the same struct pinned_mountpoint
now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 06c672127aee..9ffdbb093f57 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2613,10 +2613,11 @@ enum mnt_tree_flags_t {
  *         Otherwise a negative error code is returned.
  */
 static int attach_recursive_mnt(struct mount *source_mnt,
-				struct mount *dest_mnt,
-				struct mountpoint *dest_mp)
+				const struct pinned_mountpoint *dest)
 {
 	struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns;
+	struct mount *dest_mnt = dest->parent;
+	struct mountpoint *dest_mp = dest->mp;
 	HLIST_HEAD(tree_list);
 	struct mnt_namespace *ns = dest_mnt->mnt_ns;
 	struct pinned_mountpoint root = {};
@@ -2864,16 +2865,16 @@ static inline void unlock_mount(struct pinned_mountpoint *m)
 	struct pinned_mountpoint mp __cleanup(unlock_mount) = {}; \
 	lock_mount_exact((path), &mp)
 
-static int graft_tree(struct mount *mnt, struct mount *p, struct mountpoint *mp)
+static int graft_tree(struct mount *mnt, const struct pinned_mountpoint *mp)
 {
 	if (mnt->mnt.mnt_sb->s_flags & SB_NOUSER)
 		return -EINVAL;
 
-	if (d_is_dir(mp->m_dentry) !=
+	if (d_is_dir(mp->mp->m_dentry) !=
 	      d_is_dir(mnt->mnt.mnt_root))
 		return -ENOTDIR;
 
-	return attach_recursive_mnt(mnt, p, mp);
+	return attach_recursive_mnt(mnt, mp);
 }
 
 static int may_change_propagation(const struct mount *m)
@@ -3055,7 +3056,7 @@ static int do_loopback(struct path *path, const char *old_name,
 	if (IS_ERR(mnt))
 		return PTR_ERR(mnt);
 
-	err = graft_tree(mnt, mp.parent, mp.mp);
+	err = graft_tree(mnt, &mp);
 	if (err) {
 		lock_mount_hash();
 		umount_tree(mnt, UMOUNT_SYNC);
@@ -3634,7 +3635,7 @@ static int do_move_mount(struct path *old_path,
 	if (mount_is_ancestor(old, mp.parent))
 		return -ELOOP;
 
-	return attach_recursive_mnt(old, mp.parent, mp.mp);
+	return attach_recursive_mnt(old, &mp);
 }
 
 static int do_move_mount_old(struct path *path, const char *old_name)
@@ -3685,7 +3686,7 @@ static int do_add_mount(struct mount *newmnt, const struct pinned_mountpoint *mp
 		return -EINVAL;
 
 	newmnt->mnt.mnt_flags = mnt_flags;
-	return graft_tree(newmnt, parent, mp->mp);
+	return graft_tree(newmnt, mp);
 }
 
 static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 31/52] pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (28 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 30/52] graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:43     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 32/52] don't bother passing new_path->dentry to can_move_mount_beneath() Al Viro
                     ` (21 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

That kills the last place where callers of lock_mount(path, &mp)
used path->dentry.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 9ffdbb093f57..494433d2e04b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4682,7 +4682,7 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	if (!mnt_has_parent(new_mnt))
 		return -EINVAL; /* absolute root */
 	/* make sure we can reach put_old from new_root */
-	if (!is_path_reachable(old_mnt, old.dentry, &new))
+	if (!is_path_reachable(old_mnt, old_mp.mp->m_dentry, &new))
 		return -EINVAL;
 	/* make certain new is below the root */
 	if (!is_path_reachable(new_mnt, new.dentry, &root))
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 32/52] don't bother passing new_path->dentry to can_move_mount_beneath()
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (29 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 31/52] pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25  4:43   ` [PATCH 33/52] new helper: topmost_overmount() Al Viro
                     ` (20 subsequent siblings)
  51 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 494433d2e04b..7d51763fc76c 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3451,8 +3451,8 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
 /**
  * can_move_mount_beneath - check that we can mount beneath the top mount
  * @mnt_from: mount we are trying to move
- * @to:   mount under which to mount
- * @mp:   mountpoint of @to
+ * @mnt_to:   mount under which to mount
+ * @mp:   mountpoint of @mnt_to
  *
  * - Make sure that nothing can be mounted beneath the caller's current
  *   root or the rootfs of the namespace.
@@ -3468,11 +3468,10 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  * Return: On success 0, and on error a negative error code is returned.
  */
 static int can_move_mount_beneath(struct mount *mnt_from,
-				  const struct path *to,
+				  struct mount *mnt_to,
 				  const struct mountpoint *mp)
 {
-	struct mount *mnt_to = real_mount(to->mnt),
-		     *parent_mnt_to = mnt_to->mnt_parent;
+	struct mount *parent_mnt_to = mnt_to->mnt_parent;
 
 	if (IS_MNT_LOCKED(mnt_to))
 		return -EINVAL;
@@ -3619,7 +3618,7 @@ static int do_move_mount(struct path *old_path,
 	}
 
 	if (beneath) {
-		err = can_move_mount_beneath(old, new_path, mp.mp);
+		err = can_move_mount_beneath(old, real_mount(new_path->mnt), mp.mp);
 		if (err)
 			return err;
 	}
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 33/52] new helper: topmost_overmount()
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (30 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 32/52] don't bother passing new_path->dentry to can_move_mount_beneath() Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:43     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 34/52] do_lock_mount(): don't modify path Al Viro
                     ` (19 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Returns the final (topmost) mount in the chain of overmounts
starting at given mount.  Same locking rules as for any mount
tree traversal - either the spinlock side of mount_lock, or
rcu + sample the seqcount side of mount_lock before the call
and recheck afterwards.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/mount.h     | 7 +++++++
 fs/namespace.c | 9 +++------
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index ed8c83ba836a..04d0eadc4c10 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -235,4 +235,11 @@ static inline void mnt_notify_add(struct mount *m)
 }
 #endif
 
+static inline struct mount *topmost_overmount(struct mount *m)
+{
+	while (m->overmount)
+		m = m->overmount;
+	return m;
+}
+
 struct mnt_namespace *mnt_ns_from_dentry(struct dentry *dentry);
diff --git a/fs/namespace.c b/fs/namespace.c
index 7d51763fc76c..93eba16e42b6 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2697,10 +2697,9 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 				 child->mnt_mountpoint);
 		commit_tree(child);
 		if (q) {
+			struct mount *r = topmost_overmount(child);
 			struct mountpoint *mp = root.mp;
-			struct mount *r = child;
-			while (unlikely(r->overmount))
-				r = r->overmount;
+
 			if (unlikely(shorter) && child != source_mnt)
 				mp = shorter;
 			mnt_change_mountpoint(r, mp, q);
@@ -6178,9 +6177,7 @@ bool current_chrooted(void)
 
 	guard(mount_locked_reader)();
 
-	root = current->nsproxy->mnt_ns->root;
-	while (unlikely(root->overmount))
-		root = root->overmount;
+	root = topmost_overmount(current->nsproxy->mnt_ns->root);
 
 	return fs_root.mnt != &root->mnt || !path_mounted(&fs_root);
 }
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 34/52] do_lock_mount(): don't modify path.
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (31 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 33/52] new helper: topmost_overmount() Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-26 14:14     ` Askar Safin
  2025-08-25  4:43   ` [PATCH 35/52] constify check_mnt() Al Viro
                     ` (18 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Currently do_lock_mount() has the target path switched to whatever
might be overmounting it.  We _do_ want to have the parent
mount/mountpoint chosen on top of the overmounting pile; however,
the way it's done has unpleasant races - if umount propagation
removes the overmount while we'd been trying to set the environment
up, we might end up failing if our target path strays into that overmount
just before the overmount gets kicked out.

Users of do_lock_mount() do not need the target path changed - they
have all information in res->{parent,mp}; only one place (in
do_move_mount()) currently uses the resulting path->mnt, and that value
is trivial to reconstruct by the original value of path->mnt + chosen
parent mount.

Let's keep the target path unchanged; it avoids a bunch of subtle races
and it's not hard to do:
	do
		as mount_locked_reader
			find the prospective parent mount/mountpoint dentry
			grab references if it's not the original target
		lock the prospective mountpoint dentry
		take namespace_sem exclusive
		if prospective parent/mountpoint would be different now
			err = -EAGAIN
		else if location has been unmounted
			err = -ENOENT
		else if mountpoint dentry is not allowed to be mounted on
			err = -ENOENT
		else if beneath and the top of the pile was the absolute root
			err = -EINVAL
		else
			try to get struct mountpoint (by dentry), set
			err to 0 on success and -ENO{MEM,ENT} on failure
		if err != 0
			res->parent = ERR_PTR(err)
			drop locks
		else
			res->parent = prospective parent
		drop temporary references
	while err == -EAGAIN

A somewhat subtle part is that dropping temporary references is allowed.
Neither mounts nor dentries should be evicted by a thread that holds
namespace_sem.  On success we are dropping those references under
namespace_sem, so we need to be sure that these are not the last
references remaining.  However, on success we'd already verified (under
namespace_sem) that original target is still mounted and that mount
and dentry we are about to drop are still reachable from it via the
mount tree.  That guarantees that we are not about to drop the last
remaining references.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 126 ++++++++++++++++++++++++++-----------------------
 1 file changed, 68 insertions(+), 58 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 93eba16e42b6..f95e12ab6c9a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2728,6 +2728,27 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 	return err;
 }
 
+static inline struct mount *where_to_mount(const struct path *path,
+					   struct dentry **dentry,
+					   bool beneath)
+{
+	struct mount *m;
+
+	if (unlikely(beneath)) {
+		m = topmost_overmount(real_mount(path->mnt));
+		*dentry = m->mnt_mountpoint;
+		return m->mnt_parent;
+	} else {
+		m = __lookup_mnt(path->mnt, *dentry = path->dentry);
+		if (unlikely(m)) {
+			m = topmost_overmount(m);
+			*dentry = m->mnt.mnt_root;
+			return m;
+		}
+		return real_mount(path->mnt);
+	}
+}
+
 /**
  * do_lock_mount - acquire environment for mounting
  * @path:	target path
@@ -2759,84 +2780,69 @@ static int attach_recursive_mnt(struct mount *source_mnt,
  * case we also require the location to be at the root of a mount
  * that has a parent (i.e. is not a root of some namespace).
  */
-static void do_lock_mount(struct path *path, struct pinned_mountpoint *res, bool beneath)
+static void do_lock_mount(const struct path *path,
+			  struct pinned_mountpoint *res,
+			  bool beneath)
 {
-	struct vfsmount *mnt = path->mnt;
-	struct dentry *dentry;
-	struct path under = {};
-	int err = -ENOENT;
+	int err;
 
 	if (unlikely(beneath) && !path_mounted(path)) {
 		res->parent = ERR_PTR(-EINVAL);
 		return;
 	}
 
-	for (;;) {
-		struct mount *m = real_mount(mnt);
-
-		if (beneath) {
-			path_put(&under);
-			read_seqlock_excl(&mount_lock);
-			if (unlikely(!mnt_has_parent(m))) {
-				read_sequnlock_excl(&mount_lock);
-				res->parent = ERR_PTR(-EINVAL);
-				return;
+	do {
+		struct dentry *dentry, *d;
+		struct mount *m, *n;
+
+		scoped_guard(mount_locked_reader) {
+			m = where_to_mount(path, &dentry, beneath);
+			if (&m->mnt != path->mnt) {
+				mntget(&m->mnt);
+				dget(dentry);
 			}
-			under.mnt = mntget(&m->mnt_parent->mnt);
-			under.dentry = dget(m->mnt_mountpoint);
-			read_sequnlock_excl(&mount_lock);
-			dentry = under.dentry;
-		} else {
-			dentry = path->dentry;
 		}
 
 		inode_lock(dentry->d_inode);
 		namespace_lock();
 
-		if (unlikely(cant_mount(dentry) || !is_mounted(mnt)))
-			break;		// not to be mounted on
+		// check if the chain of mounts (if any) has changed.
+		scoped_guard(mount_locked_reader)
+			n = where_to_mount(path, &d, beneath);
 
-		if (beneath && unlikely(m->mnt_mountpoint != dentry ||
-				        &m->mnt_parent->mnt != under.mnt)) {
-			namespace_unlock();
-			inode_unlock(dentry->d_inode);
-			continue;	// got moved
-		}
+		if (unlikely(n != m || dentry != d))
+			err = -EAGAIN;		// something moved, retry
+		else if (unlikely(cant_mount(dentry) || !is_mounted(path->mnt)))
+			err = -ENOENT;		// not to be mounted on
+		else if (beneath && &m->mnt == path->mnt && !m->overmount)
+			err = -EINVAL;
+		else
+			err = get_mountpoint(dentry, res);
 
-		mnt = lookup_mnt(path);
-		if (unlikely(mnt)) {
+		if (unlikely(err)) {
+			res->parent = ERR_PTR(err);
 			namespace_unlock();
 			inode_unlock(dentry->d_inode);
-			path_put(path);
-			path->mnt = mnt;
-			path->dentry = dget(mnt->mnt_root);
-			continue;	// got overmounted
+		} else {
+			res->parent = m;
 		}
-		err = get_mountpoint(dentry, res);
-		if (err)
-			break;
-		if (beneath) {
-			/*
-			 * @under duplicates the references that will stay
-			 * at least until namespace_unlock(), so the path_put()
-			 * below is safe (and OK to do under namespace_lock -
-			 * we are not dropping the final references here).
-			 */
-			path_put(&under);
-			res->parent = real_mount(path->mnt)->mnt_parent;
-			return;
+		/*
+		 * Drop the temporary references.  This is subtle - on success
+		 * we are doing that under namespace_sem, which would normally
+		 * be forbidden.  However, in that case we are guaranteed that
+		 * refcounts won't reach zero, since we know that path->mnt
+		 * is mounted and thus all mounts reachable from it are pinned
+		 * and stable, along with their mountpoints and roots.
+		 */
+		if (&m->mnt != path->mnt) {
+			dput(dentry);
+			mntput(&m->mnt);
 		}
-		res->parent = real_mount(path->mnt);
-		return;
-	}
-	namespace_unlock();
-	inode_unlock(dentry->d_inode);
-	if (beneath)
-		path_put(&under);
-	res->parent = ERR_PTR(err);
+	} while (err == -EAGAIN);
 }
 
-static inline void lock_mount(struct path *path, struct pinned_mountpoint *m)
+static inline void lock_mount(const struct path *path,
+			      struct pinned_mountpoint *m)
 {
 	do_lock_mount(path, m, false);
 }
@@ -3617,7 +3623,11 @@ static int do_move_mount(struct path *old_path,
 	}
 
 	if (beneath) {
-		err = can_move_mount_beneath(old, real_mount(new_path->mnt), mp.mp);
+		struct mount *over = real_mount(new_path->mnt);
+
+		if (mp.parent != over->mnt_parent)
+			over = mp.parent->overmount;
+		err = can_move_mount_beneath(old, over, mp.mp);
 		if (err)
 			return err;
 	}
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 35/52] constify check_mnt()
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (32 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 34/52] do_lock_mount(): don't modify path Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:43     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 36/52] do_mount_setattr(): constify path argument Al Viro
                     ` (17 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index f95e12ab6c9a..458bef569816 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1010,7 +1010,7 @@ static void unpin_mountpoint(struct pinned_mountpoint *m)
 	}
 }
 
-static inline int check_mnt(struct mount *mnt)
+static inline int check_mnt(const struct mount *mnt)
 {
 	return mnt->mnt_ns == current->nsproxy->mnt_ns;
 }
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 36/52] do_mount_setattr(): constify path argument
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (33 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 35/52] constify check_mnt() Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:30     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 37/52] do_set_group(): constify path arguments Al Viro
                     ` (16 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 458bef569816..2db9b006e37e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4872,7 +4872,7 @@ static void mount_setattr_commit(struct mount_kattr *kattr, struct mount *mnt)
 	touch_mnt_namespace(mnt->mnt_ns);
 }
 
-static int do_mount_setattr(struct path *path, struct mount_kattr *kattr)
+static int do_mount_setattr(const struct path *path, struct mount_kattr *kattr)
 {
 	struct mount *mnt = real_mount(path->mnt);
 	int err = 0;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 37/52] do_set_group(): constify path arguments
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (34 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 36/52] do_mount_setattr(): constify path argument Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:29     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 38/52] drop_collected_paths(): constify arguments Al Viro
                     ` (15 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2db9b006e37e..d61601fc97ca 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3360,7 +3360,7 @@ static inline int tree_contains_unbindable(struct mount *mnt)
 	return 0;
 }
 
-static int do_set_group(struct path *from_path, struct path *to_path)
+static int do_set_group(const struct path *from_path, const struct path *to_path)
 {
 	struct mount *from = real_mount(from_path->mnt);
 	struct mount *to = real_mount(to_path->mnt);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 38/52] drop_collected_paths(): constify arguments
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (35 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 37/52] do_set_group(): constify path arguments Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:31     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 39/52] collect_paths(): constify the return value Al Viro
                     ` (14 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... and use that to constify the pointers in callers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c        |  4 ++--
 include/linux/mount.h |  2 +-
 kernel/audit_tree.c   | 12 ++++++------
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index d61601fc97ca..d29d7c948ec1 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2334,9 +2334,9 @@ struct path *collect_paths(const struct path *path,
 	return res;
 }
 
-void drop_collected_paths(struct path *paths, struct path *prealloc)
+void drop_collected_paths(const struct path *paths, struct path *prealloc)
 {
-	for (struct path *p = paths; p->mnt; p++)
+	for (const struct path *p = paths; p->mnt; p++)
 		path_put(p);
 	if (paths != prealloc)
 		kfree(paths);
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 5f9c053b0897..c09032463b36 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -105,7 +105,7 @@ extern int may_umount(struct vfsmount *);
 int do_mount(const char *, const char __user *,
 		     const char *, unsigned long, void *);
 extern struct path *collect_paths(const struct path *, struct path *, unsigned);
-extern void drop_collected_paths(struct path *, struct path *);
+extern void drop_collected_paths(const struct path *, struct path *);
 extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num);
 
 extern int cifs_root_data(char **dev, char **opts);
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index b0eae2a3c895..32007edf0e55 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -678,7 +678,7 @@ void audit_trim_trees(void)
 		struct audit_tree *tree;
 		struct path path;
 		struct audit_node *node;
-		struct path *paths;
+		const struct path *paths;
 		struct path array[16];
 		int err;
 
@@ -701,7 +701,7 @@ void audit_trim_trees(void)
 			struct audit_chunk *chunk = find_chunk(node);
 			/* this could be NULL if the watch is dying else where... */
 			node->index |= 1U<<31;
-			for (struct path *p = paths; p->dentry; p++) {
+			for (const struct path *p = paths; p->dentry; p++) {
 				struct inode *inode = p->dentry->d_inode;
 				if (inode_to_key(inode) == chunk->key) {
 					node->index &= ~(1U<<31);
@@ -740,9 +740,9 @@ void audit_put_tree(struct audit_tree *tree)
 	put_tree(tree);
 }
 
-static int tag_mounts(struct path *paths, struct audit_tree *tree)
+static int tag_mounts(const struct path *paths, struct audit_tree *tree)
 {
-	for (struct path *p = paths; p->dentry; p++) {
+	for (const struct path *p = paths; p->dentry; p++) {
 		int err = tag_chunk(p->dentry->d_inode, tree);
 		if (err)
 			return err;
@@ -805,7 +805,7 @@ int audit_add_tree_rule(struct audit_krule *rule)
 	struct audit_tree *seed = rule->tree, *tree;
 	struct path path;
 	struct path array[16];
-	struct path *paths;
+	const struct path *paths;
 	int err;
 
 	rule->tree = NULL;
@@ -877,7 +877,7 @@ int audit_tag_tree(char *old, char *new)
 	int failed = 0;
 	struct path path1, path2;
 	struct path array[16];
-	struct path *paths;
+	const struct path *paths;
 	int err;
 
 	err = kern_path(new, 0, &path2);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 39/52] collect_paths(): constify the return value
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (36 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 38/52] drop_collected_paths(): constify arguments Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:30     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 40/52] do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s) Al Viro
                     ` (13 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

callers have no business modifying the paths they get

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c        | 4 ++--
 include/linux/mount.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index d29d7c948ec1..cc4e18040506 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2300,7 +2300,7 @@ static inline bool extend_array(struct path **res, struct path **to_free,
 	return p;
 }
 
-struct path *collect_paths(const struct path *path,
+const struct path *collect_paths(const struct path *path,
 			      struct path *prealloc, unsigned count)
 {
 	struct mount *root = real_mount(path->mnt);
@@ -2334,7 +2334,7 @@ struct path *collect_paths(const struct path *path,
 	return res;
 }
 
-void drop_collected_paths(const struct path *paths, struct path *prealloc)
+void drop_collected_paths(const struct path *paths, const struct path *prealloc)
 {
 	for (const struct path *p = paths; p->mnt; p++)
 		path_put(p);
diff --git a/include/linux/mount.h b/include/linux/mount.h
index c09032463b36..18e4b97f8a98 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -104,8 +104,8 @@ extern int may_umount_tree(struct vfsmount *);
 extern int may_umount(struct vfsmount *);
 int do_mount(const char *, const char __user *,
 		     const char *, unsigned long, void *);
-extern struct path *collect_paths(const struct path *, struct path *, unsigned);
-extern void drop_collected_paths(const struct path *, struct path *);
+extern const struct path *collect_paths(const struct path *, struct path *, unsigned);
+extern void drop_collected_paths(const struct path *, const struct path *);
 extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num);
 
 extern int cifs_root_data(char **dev, char **opts);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 40/52] do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s)
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (37 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 39/52] collect_paths(): constify the return value Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:30     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 41/52] mnt_warn_timestamp_expiry(): constify struct path argument Al Viro
                     ` (12 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index cc4e18040506..4704630847af 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3573,8 +3573,9 @@ static inline bool may_use_mount(struct mount *mnt)
 	return check_anonymous_mnt(mnt);
 }
 
-static int do_move_mount(struct path *old_path,
-			 struct path *new_path, enum mnt_tree_flags_t flags)
+static int do_move_mount(const struct path *old_path,
+			 const struct path *new_path,
+			 enum mnt_tree_flags_t flags)
 {
 	struct mount *old = real_mount(old_path->mnt);
 	int err;
@@ -3646,7 +3647,7 @@ static int do_move_mount(struct path *old_path,
 	return attach_recursive_mnt(old, &mp);
 }
 
-static int do_move_mount_old(struct path *path, const char *old_name)
+static int do_move_mount_old(const struct path *path, const char *old_name)
 {
 	struct path old_path;
 	int err;
@@ -4481,7 +4482,8 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 	return ret;
 }
 
-static inline int vfs_move_mount(struct path *from_path, struct path *to_path,
+static inline int vfs_move_mount(const struct path *from_path,
+				 const struct path *to_path,
 				 enum mnt_tree_flags_t mflags)
 {
 	int ret;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 41/52] mnt_warn_timestamp_expiry(): constify struct path argument
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (38 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 40/52] do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s) Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:32     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 42/52] do_new_mount{,_fc}(): " Al Viro
                     ` (11 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 4704630847af..70636922310c 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3231,7 +3231,8 @@ static void set_mount_attributes(struct mount *mnt, unsigned int mnt_flags)
 	touch_mnt_namespace(mnt->mnt_ns);
 }
 
-static void mnt_warn_timestamp_expiry(struct path *mountpoint, struct vfsmount *mnt)
+static void mnt_warn_timestamp_expiry(const struct path *mountpoint,
+				      struct vfsmount *mnt)
 {
 	struct super_block *sb = mnt->mnt_sb;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 42/52] do_new_mount{,_fc}(): constify struct path argument
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (39 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 41/52] mnt_warn_timestamp_expiry(): constify struct path argument Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:30     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 43/52] do_{loopback,change_type,remount,reconfigure_mnt}(): " Al Viro
                     ` (10 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 70636922310c..bf1a6efd335e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3705,7 +3705,7 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
  * Create a new mount using a superblock configuration and request it
  * be added to the namespace tree.
  */
-static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
+static int do_new_mount_fc(struct fs_context *fc, const struct path *mountpoint,
 			   unsigned int mnt_flags)
 {
 	struct super_block *sb = fc->root->d_sb;
@@ -3739,8 +3739,9 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
  * create a new mount for userspace and request it to be added into the
  * namespace's tree
  */
-static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
-			int mnt_flags, const char *name, void *data)
+static int do_new_mount(const struct path *path, const char *fstype,
+			int sb_flags, int mnt_flags,
+			const char *name, void *data)
 {
 	struct file_system_type *type;
 	struct fs_context *fc;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 43/52] do_{loopback,change_type,remount,reconfigure_mnt}(): constify struct path argument
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (40 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 42/52] do_new_mount{,_fc}(): " Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:31     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 44/52] path_mount(): " Al Viro
                     ` (9 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index bf1a6efd335e..68c12866205c 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2915,7 +2915,7 @@ static int flags_to_propagation_type(int ms_flags)
 /*
  * recursively change the type of the mountpoint.
  */
-static int do_change_type(struct path *path, int ms_flags)
+static int do_change_type(const struct path *path, int ms_flags)
 {
 	struct mount *m;
 	struct mount *mnt = real_mount(path->mnt);
@@ -3035,8 +3035,8 @@ static struct mount *__do_loopback(struct path *old_path, int recurse)
 /*
  * do loopback mount.
  */
-static int do_loopback(struct path *path, const char *old_name,
-				int recurse)
+static int do_loopback(const struct path *path, const char *old_name,
+		       int recurse)
 {
 	struct path old_path __free(path_put) = {};
 	struct mount *mnt = NULL;
@@ -3266,7 +3266,7 @@ static void mnt_warn_timestamp_expiry(const struct path *mountpoint,
  * superblock it refers to.  This is triggered by specifying MS_REMOUNT|MS_BIND
  * to mount(2).
  */
-static int do_reconfigure_mnt(struct path *path, unsigned int mnt_flags)
+static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags)
 {
 	struct super_block *sb = path->mnt->mnt_sb;
 	struct mount *mnt = real_mount(path->mnt);
@@ -3303,7 +3303,7 @@ static int do_reconfigure_mnt(struct path *path, unsigned int mnt_flags)
  * If you've mounted a non-root directory somewhere and want to do remount
  * on it - tough luck.
  */
-static int do_remount(struct path *path, int ms_flags, int sb_flags,
+static int do_remount(const struct path *path, int ms_flags, int sb_flags,
 		      int mnt_flags, void *data)
 {
 	int err;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 44/52] path_mount(): constify struct path argument
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (41 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 43/52] do_{loopback,change_type,remount,reconfigure_mnt}(): " Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:32     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 45/52] may_copy_tree(), __do_loopback(): " Al Viro
                     ` (8 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

now it finally can be done.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/internal.h  | 2 +-
 fs/namespace.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 38e8aab27bbd..fe88563b4822 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -84,7 +84,7 @@ void mnt_put_write_access_file(struct file *file);
 extern void dissolve_on_fput(struct vfsmount *);
 extern bool may_mount(void);
 
-int path_mount(const char *dev_name, struct path *path,
+int path_mount(const char *dev_name, const struct path *path,
 		const char *type_page, unsigned long flags, void *data_page);
 int path_umount(struct path *path, int flags);
 
diff --git a/fs/namespace.c b/fs/namespace.c
index 68c12866205c..94eec417cc61 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4024,7 +4024,7 @@ static char *copy_mount_string(const void __user *data)
  * Therefore, if this magic number is present, it carries no information
  * and must be discarded.
  */
-int path_mount(const char *dev_name, struct path *path,
+int path_mount(const char *dev_name, const struct path *path,
 		const char *type_page, unsigned long flags, void *data_page)
 {
 	unsigned int mnt_flags = 0, sb_flags;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 45/52] may_copy_tree(), __do_loopback(): constify struct path argument
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (42 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 44/52] path_mount(): " Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:40     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 46/52] path_umount(): " Al Viro
                     ` (7 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 94eec417cc61..a94aa249cedb 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2991,7 +2991,7 @@ static int do_change_type(const struct path *path, int ms_flags)
  *
  * Returns true if the mount tree can be copied, false otherwise.
  */
-static inline bool may_copy_tree(struct path *path)
+static inline bool may_copy_tree(const struct path *path)
 {
 	struct mount *mnt = real_mount(path->mnt);
 	const struct dentry_operations *d_op;
@@ -3013,7 +3013,7 @@ static inline bool may_copy_tree(struct path *path)
 }
 
 
-static struct mount *__do_loopback(struct path *old_path, int recurse)
+static struct mount *__do_loopback(const struct path *old_path, int recurse)
 {
 	struct mount *old = real_mount(old_path->mnt);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 46/52] path_umount(): constify struct path argument
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (43 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 45/52] may_copy_tree(), __do_loopback(): " Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:40     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 47/52] constify can_move_mount_beneath() arguments Al Viro
                     ` (6 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/internal.h  | 2 +-
 fs/namespace.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index fe88563b4822..549e6bd453b0 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -86,7 +86,7 @@ extern bool may_mount(void);
 
 int path_mount(const char *dev_name, const struct path *path,
 		const char *type_page, unsigned long flags, void *data_page);
-int path_umount(struct path *path, int flags);
+int path_umount(const struct path *path, int flags);
 
 int show_path(struct seq_file *m, struct dentry *root);
 
diff --git a/fs/namespace.c b/fs/namespace.c
index a94aa249cedb..76f0dde2ff62 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2084,7 +2084,7 @@ static int can_umount(const struct path *path, int flags)
 }
 
 // caller is responsible for flags being sane
-int path_umount(struct path *path, int flags)
+int path_umount(const struct path *path, int flags)
 {
 	struct mount *mnt = real_mount(path->mnt);
 	int ret;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 47/52] constify can_move_mount_beneath() arguments
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (44 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 46/52] path_umount(): " Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:39     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 48/52] do_move_mount_old(): use __free(path_put) Al Viro
                     ` (5 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 76f0dde2ff62..c6fd5d4d7947 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3473,8 +3473,8 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  * Context: This function expects namespace_lock() to be held.
  * Return: On success 0, and on error a negative error code is returned.
  */
-static int can_move_mount_beneath(struct mount *mnt_from,
-				  struct mount *mnt_to,
+static int can_move_mount_beneath(const struct mount *mnt_from,
+				  const struct mount *mnt_to,
 				  const struct mountpoint *mp)
 {
 	struct mount *parent_mnt_to = mnt_to->mnt_parent;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 48/52] do_move_mount_old(): use __free(path_put)
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (45 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 47/52] constify can_move_mount_beneath() arguments Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:40     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 49/52] do_mount(): " Al Viro
                     ` (4 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index c6fd5d4d7947..da30c7b757d3 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3650,7 +3650,7 @@ static int do_move_mount(const struct path *old_path,
 
 static int do_move_mount_old(const struct path *path, const char *old_name)
 {
-	struct path old_path;
+	struct path old_path __free(path_put) = {};
 	int err;
 
 	if (!old_name || !*old_name)
@@ -3660,9 +3660,7 @@ static int do_move_mount_old(const struct path *path, const char *old_name)
 	if (err)
 		return err;
 
-	err = do_move_mount(&old_path, path, 0);
-	path_put(&old_path);
-	return err;
+	return do_move_mount(&old_path, path, 0);
 }
 
 /*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 49/52] do_mount(): use __free(path_put)
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (46 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 48/52] do_move_mount_old(): use __free(path_put) Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:32     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 50/52] umount_tree(): take all victims out of propagation graph at once Al Viro
                     ` (3 subsequent siblings)
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index da30c7b757d3..d8554742b1c0 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4104,15 +4104,13 @@ int path_mount(const char *dev_name, const struct path *path,
 int do_mount(const char *dev_name, const char __user *dir_name,
 		const char *type_page, unsigned long flags, void *data_page)
 {
-	struct path path;
+	struct path path __free(path_put) = {};
 	int ret;
 
 	ret = user_path_at(AT_FDCWD, dir_name, LOOKUP_FOLLOW, &path);
 	if (ret)
 		return ret;
-	ret = path_mount(dev_name, &path, type_page, flags, data_page);
-	path_put(&path);
-	return ret;
+	return path_mount(dev_name, &path, type_page, flags, data_page);
 }
 
 static struct ucounts *inc_mnt_namespaces(struct user_namespace *ns)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 50/52] umount_tree(): take all victims out of propagation graph at once
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (47 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 49/52] do_mount(): " Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25  4:43   ` [PATCH 51/52] ecryptfs: get rid of pointless mount references in ecryptfs dentries Al Viro
                     ` (2 subsequent siblings)
  51 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

For each removed mount we need to calculate where the slaves will end up.
To avoid duplicating that work, do it for all mounts to be removed
at once, taking the mounts themselves out of propagation graph as
we go, then do all transfers; the duplicate work on finding destinations
is avoided since if we run into a mount that already had destination found,
we don't need to trace the rest of the way.  That's guaranteed
O(removed mounts) for finding destinations and removing from propagation
graph and O(surviving mounts that have master removed) for transfers.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c |  3 ++-
 fs/pnode.c     | 67 +++++++++++++++++++++++++++++++++++++++-----------
 fs/pnode.h     |  1 +
 3 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index d8554742b1c0..82cab5459ec7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1846,6 +1846,8 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
 	if (how & UMOUNT_PROPAGATE)
 		propagate_umount(&tmp_list);
 
+	bulk_make_private(&tmp_list);
+
 	while (!list_empty(&tmp_list)) {
 		struct mnt_namespace *ns;
 		bool disconnect;
@@ -1870,7 +1872,6 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
 				umount_mnt(p);
 			}
 		}
-		change_mnt_propagation(p, MS_PRIVATE);
 		if (disconnect)
 			hlist_add_head(&p->mnt_umount, &unmounted);
 
diff --git a/fs/pnode.c b/fs/pnode.c
index edaf9d9d0eaf..5d91c3e58d2a 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -71,19 +71,6 @@ static inline bool will_be_unmounted(struct mount *m)
 	return m->mnt.mnt_flags & MNT_UMOUNT;
 }
 
-static struct mount *propagation_source(struct mount *mnt)
-{
-	do {
-		struct mount *m;
-		for (m = next_peer(mnt); m != mnt; m = next_peer(m)) {
-			if (!will_be_unmounted(m))
-				return m;
-		}
-		mnt = mnt->mnt_master;
-	} while (mnt && will_be_unmounted(mnt));
-	return mnt;
-}
-
 static void transfer_propagation(struct mount *mnt, struct mount *to)
 {
 	struct hlist_node *p = NULL, *n;
@@ -112,11 +99,10 @@ void change_mnt_propagation(struct mount *mnt, int type)
 		return;
 	}
 	if (IS_MNT_SHARED(mnt)) {
-		if (type == MS_SLAVE || !hlist_empty(&mnt->mnt_slave_list))
-			m = propagation_source(mnt);
 		if (list_empty(&mnt->mnt_share)) {
 			mnt_release_group_id(mnt);
 		} else {
+			m = next_peer(mnt);
 			list_del_init(&mnt->mnt_share);
 			mnt->mnt_group_id = 0;
 		}
@@ -137,6 +123,57 @@ void change_mnt_propagation(struct mount *mnt, int type)
 	}
 }
 
+static struct mount *trace_transfers(struct mount *m)
+{
+	while (1) {
+		struct mount *next = next_peer(m);
+
+		if (next != m) {
+			list_del_init(&m->mnt_share);
+			m->mnt_group_id = 0;
+			m->mnt_master = next;
+		} else {
+			if (IS_MNT_SHARED(m))
+				mnt_release_group_id(m);
+			next = m->mnt_master;
+		}
+		hlist_del_init(&m->mnt_slave);
+		CLEAR_MNT_SHARED(m);
+		SET_MNT_MARK(m);
+
+		if (!next || !will_be_unmounted(next))
+			return next;
+		if (IS_MNT_MARKED(next))
+			return next->mnt_master;
+		m = next;
+	}
+}
+
+static void set_destinations(struct mount *m, struct mount *master)
+{
+	struct mount *next;
+
+	while ((next = m->mnt_master) != master) {
+		m->mnt_master = master;
+		m = next;
+	}
+}
+
+void bulk_make_private(struct list_head *set)
+{
+	struct mount *m;
+
+	list_for_each_entry(m, set, mnt_list)
+		if (!IS_MNT_MARKED(m))
+			set_destinations(m, trace_transfers(m));
+
+	list_for_each_entry(m, set, mnt_list) {
+		transfer_propagation(m, m->mnt_master);
+		m->mnt_master = NULL;
+		CLEAR_MNT_MARK(m);
+	}
+}
+
 static struct mount *__propagation_next(struct mount *m,
 					 struct mount *origin)
 {
diff --git a/fs/pnode.h b/fs/pnode.h
index 00ab153e3e9d..b029db225f33 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -42,6 +42,7 @@ static inline bool peers(const struct mount *m1, const struct mount *m2)
 }
 
 void change_mnt_propagation(struct mount *, int);
+void bulk_make_private(struct list_head *);
 int propagate_mnt(struct mount *, struct mountpoint *, struct mount *,
 		struct hlist_head *);
 void propagate_umount(struct list_head *);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 51/52] ecryptfs: get rid of pointless mount references in ecryptfs dentries
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (48 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 50/52] umount_tree(): take all victims out of propagation graph at once Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:41     ` Christian Brauner
  2025-08-25  4:43   ` [PATCH 52/52] fs/namespace.c: sanitize descriptions for {__,}lookup_mnt() Al Viro
  2025-08-25 12:30   ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Christian Brauner
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

->lower_path.mnt has the same value for all dentries on given ecryptfs
instance and if somebody goes for mountpoint-crossing variant where that
would not be true, we can deal with that when it happens (and _not_
with duplicating these reference into each dentry).

As it is, we are better off just sticking a reference into ecryptfs-private
part of superblock and keeping it pinned until ->kill_sb().

That way we can stick a reference to underlying dentry right into ->d_fsdata
of ecryptfs one, getting rid of indirection through struct ecryptfs_dentry_info,
along with the entire struct ecryptfs_dentry_info machinery.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ecryptfs/dentry.c          | 14 +-------------
 fs/ecryptfs/ecryptfs_kernel.h | 27 +++++++++++----------------
 fs/ecryptfs/file.c            | 15 +++++++--------
 fs/ecryptfs/inode.c           | 19 +++++--------------
 fs/ecryptfs/main.c            | 24 ++++++------------------
 5 files changed, 30 insertions(+), 69 deletions(-)

diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c
index 1dfd5b81d831..6648a924e31a 100644
--- a/fs/ecryptfs/dentry.c
+++ b/fs/ecryptfs/dentry.c
@@ -59,14 +59,6 @@ static int ecryptfs_d_revalidate(struct inode *dir, const struct qstr *name,
 	return rc;
 }
 
-struct kmem_cache *ecryptfs_dentry_info_cache;
-
-static void ecryptfs_dentry_free_rcu(struct rcu_head *head)
-{
-	kmem_cache_free(ecryptfs_dentry_info_cache,
-		container_of(head, struct ecryptfs_dentry_info, rcu));
-}
-
 /**
  * ecryptfs_d_release
  * @dentry: The ecryptfs dentry
@@ -75,11 +67,7 @@ static void ecryptfs_dentry_free_rcu(struct rcu_head *head)
  */
 static void ecryptfs_d_release(struct dentry *dentry)
 {
-	struct ecryptfs_dentry_info *p = dentry->d_fsdata;
-	if (p) {
-		path_put(&p->lower_path);
-		call_rcu(&p->rcu, ecryptfs_dentry_free_rcu);
-	}
+	dput(dentry->d_fsdata);
 }
 
 const struct dentry_operations ecryptfs_dops = {
diff --git a/fs/ecryptfs/ecryptfs_kernel.h b/fs/ecryptfs/ecryptfs_kernel.h
index 1f562e75d0e4..9e6ab0b41337 100644
--- a/fs/ecryptfs/ecryptfs_kernel.h
+++ b/fs/ecryptfs/ecryptfs_kernel.h
@@ -258,13 +258,6 @@ struct ecryptfs_inode_info {
 	struct ecryptfs_crypt_stat crypt_stat;
 };
 
-/* dentry private data. Each dentry must keep track of a lower
- * vfsmount too. */
-struct ecryptfs_dentry_info {
-	struct path lower_path;
-	struct rcu_head rcu;
-};
-
 /**
  * ecryptfs_global_auth_tok - A key used to encrypt all new files under the mountpoint
  * @flags: Status flags
@@ -348,6 +341,7 @@ struct ecryptfs_mount_crypt_stat {
 /* superblock private data. */
 struct ecryptfs_sb_info {
 	struct super_block *wsi_sb;
+	struct vfsmount *lower_mnt;
 	struct ecryptfs_mount_crypt_stat mount_crypt_stat;
 };
 
@@ -494,22 +488,25 @@ ecryptfs_set_superblock_lower(struct super_block *sb,
 }
 
 static inline void
-ecryptfs_set_dentry_private(struct dentry *dentry,
-			    struct ecryptfs_dentry_info *dentry_info)
+ecryptfs_set_dentry_lower(struct dentry *dentry,
+			  struct dentry *lower_dentry)
 {
-	dentry->d_fsdata = dentry_info;
+	dentry->d_fsdata = lower_dentry;
 }
 
 static inline struct dentry *
 ecryptfs_dentry_to_lower(struct dentry *dentry)
 {
-	return ((struct ecryptfs_dentry_info *)dentry->d_fsdata)->lower_path.dentry;
+	return dentry->d_fsdata;
 }
 
-static inline const struct path *
-ecryptfs_dentry_to_lower_path(struct dentry *dentry)
+static inline struct path
+ecryptfs_lower_path(struct dentry *dentry)
 {
-	return &((struct ecryptfs_dentry_info *)dentry->d_fsdata)->lower_path;
+	return (struct path){
+		.mnt = ecryptfs_superblock_to_private(dentry->d_sb)->lower_mnt,
+		.dentry = ecryptfs_dentry_to_lower(dentry)
+	};
 }
 
 #define ecryptfs_printk(type, fmt, arg...) \
@@ -532,7 +529,6 @@ extern unsigned int ecryptfs_number_of_users;
 
 extern struct kmem_cache *ecryptfs_auth_tok_list_item_cache;
 extern struct kmem_cache *ecryptfs_file_info_cache;
-extern struct kmem_cache *ecryptfs_dentry_info_cache;
 extern struct kmem_cache *ecryptfs_inode_info_cache;
 extern struct kmem_cache *ecryptfs_sb_info_cache;
 extern struct kmem_cache *ecryptfs_header_cache;
@@ -557,7 +553,6 @@ int ecryptfs_encrypt_and_encode_filename(
 	size_t *encoded_name_size,
 	struct ecryptfs_mount_crypt_stat *mount_crypt_stat,
 	const char *name, size_t name_size);
-struct dentry *ecryptfs_lower_dentry(struct dentry *this_dentry);
 void ecryptfs_dump_hex(char *data, int bytes);
 int virt_to_scatterlist(const void *addr, int size, struct scatterlist *sg,
 			int sg_size);
diff --git a/fs/ecryptfs/file.c b/fs/ecryptfs/file.c
index 5f8f96da09fe..7929411837cf 100644
--- a/fs/ecryptfs/file.c
+++ b/fs/ecryptfs/file.c
@@ -33,13 +33,12 @@ static ssize_t ecryptfs_read_update_atime(struct kiocb *iocb,
 				struct iov_iter *to)
 {
 	ssize_t rc;
-	const struct path *path;
 	struct file *file = iocb->ki_filp;
 
 	rc = generic_file_read_iter(iocb, to);
 	if (rc >= 0) {
-		path = ecryptfs_dentry_to_lower_path(file->f_path.dentry);
-		touch_atime(path);
+		struct path path = ecryptfs_lower_path(file->f_path.dentry);
+		touch_atime(&path);
 	}
 	return rc;
 }
@@ -59,12 +58,11 @@ static ssize_t ecryptfs_splice_read_update_atime(struct file *in, loff_t *ppos,
 						 size_t len, unsigned int flags)
 {
 	ssize_t rc;
-	const struct path *path;
 
 	rc = filemap_splice_read(in, ppos, pipe, len, flags);
 	if (rc >= 0) {
-		path = ecryptfs_dentry_to_lower_path(in->f_path.dentry);
-		touch_atime(path);
+		struct path path = ecryptfs_lower_path(in->f_path.dentry);
+		touch_atime(&path);
 	}
 	return rc;
 }
@@ -283,6 +281,7 @@ static int ecryptfs_dir_open(struct inode *inode, struct file *file)
 	 * ecryptfs_lookup() */
 	struct ecryptfs_file_info *file_info;
 	struct file *lower_file;
+	struct path path;
 
 	/* Released in ecryptfs_release or end of function if failure */
 	file_info = kmem_cache_zalloc(ecryptfs_file_info_cache, GFP_KERNEL);
@@ -292,8 +291,8 @@ static int ecryptfs_dir_open(struct inode *inode, struct file *file)
 				"Error attempting to allocate memory\n");
 		return -ENOMEM;
 	}
-	lower_file = dentry_open(ecryptfs_dentry_to_lower_path(ecryptfs_dentry),
-				 file->f_flags, current_cred());
+	path = ecryptfs_lower_path(ecryptfs_dentry);
+	lower_file = dentry_open(&path, file->f_flags, current_cred());
 	if (IS_ERR(lower_file)) {
 		printk(KERN_ERR "%s: Error attempting to initialize "
 			"the lower file for the dentry with name "
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 72fbe1316ab8..d2b262dc485d 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -327,24 +327,15 @@ static int ecryptfs_i_size_read(struct dentry *dentry, struct inode *inode)
 static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry,
 				     struct dentry *lower_dentry)
 {
-	const struct path *path = ecryptfs_dentry_to_lower_path(dentry->d_parent);
+	struct dentry *lower_parent = ecryptfs_dentry_to_lower(dentry->d_parent);
 	struct inode *inode, *lower_inode;
-	struct ecryptfs_dentry_info *dentry_info;
 	int rc = 0;
 
-	dentry_info = kmem_cache_alloc(ecryptfs_dentry_info_cache, GFP_KERNEL);
-	if (!dentry_info) {
-		dput(lower_dentry);
-		return ERR_PTR(-ENOMEM);
-	}
-
 	fsstack_copy_attr_atime(d_inode(dentry->d_parent),
-				d_inode(path->dentry));
+				d_inode(lower_parent));
 	BUG_ON(!d_count(lower_dentry));
 
-	ecryptfs_set_dentry_private(dentry, dentry_info);
-	dentry_info->lower_path.mnt = mntget(path->mnt);
-	dentry_info->lower_path.dentry = lower_dentry;
+	ecryptfs_set_dentry_lower(dentry, lower_dentry);
 
 	/*
 	 * negative dentry can go positive under us here - its parent is not
@@ -1022,10 +1013,10 @@ static int ecryptfs_getattr(struct mnt_idmap *idmap,
 {
 	struct dentry *dentry = path->dentry;
 	struct kstat lower_stat;
+	struct path lower_path = ecryptfs_lower_path(dentry);
 	int rc;
 
-	rc = vfs_getattr_nosec(ecryptfs_dentry_to_lower_path(dentry),
-			       &lower_stat, request_mask, flags);
+	rc = vfs_getattr_nosec(&lower_path, &lower_stat, request_mask, flags);
 	if (!rc) {
 		fsstack_copy_attr_all(d_inode(dentry),
 				      ecryptfs_inode_to_lower(d_inode(dentry)));
diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c
index eab1beb846d3..2afbcbbd9546 100644
--- a/fs/ecryptfs/main.c
+++ b/fs/ecryptfs/main.c
@@ -106,15 +106,14 @@ static int ecryptfs_init_lower_file(struct dentry *dentry,
 				    struct file **lower_file)
 {
 	const struct cred *cred = current_cred();
-	const struct path *path = ecryptfs_dentry_to_lower_path(dentry);
+	struct path path = ecryptfs_lower_path(dentry);
 	int rc;
 
-	rc = ecryptfs_privileged_open(lower_file, path->dentry, path->mnt,
-				      cred);
+	rc = ecryptfs_privileged_open(lower_file, path.dentry, path.mnt, cred);
 	if (rc) {
 		printk(KERN_ERR "Error opening lower file "
 		       "for lower_dentry [0x%p] and lower_mnt [0x%p]; "
-		       "rc = [%d]\n", path->dentry, path->mnt, rc);
+		       "rc = [%d]\n", path.dentry, path.mnt, rc);
 		(*lower_file) = NULL;
 	}
 	return rc;
@@ -437,7 +436,6 @@ static int ecryptfs_get_tree(struct fs_context *fc)
 	struct ecryptfs_fs_context *ctx = fc->fs_private;
 	struct ecryptfs_sb_info *sbi = fc->s_fs_info;
 	struct ecryptfs_mount_crypt_stat *mount_crypt_stat;
-	struct ecryptfs_dentry_info *root_info;
 	const char *err = "Getting sb failed";
 	struct inode *inode;
 	struct path path;
@@ -543,14 +541,8 @@ static int ecryptfs_get_tree(struct fs_context *fc)
 		goto out_free;
 	}
 
-	rc = -ENOMEM;
-	root_info = kmem_cache_zalloc(ecryptfs_dentry_info_cache, GFP_KERNEL);
-	if (!root_info)
-		goto out_free;
-
-	/* ->kill_sb() will take care of root_info */
-	ecryptfs_set_dentry_private(s->s_root, root_info);
-	root_info->lower_path = path;
+	ecryptfs_set_dentry_lower(s->s_root, path.dentry);
+	sbi->lower_mnt = path.mnt;
 
 	s->s_flags |= SB_ACTIVE;
 	fc->root = dget(s->s_root);
@@ -580,6 +572,7 @@ static void ecryptfs_kill_block_super(struct super_block *sb)
 	kill_anon_super(sb);
 	if (!sb_info)
 		return;
+	mntput(sb_info->lower_mnt);
 	ecryptfs_destroy_mount_crypt_stat(&sb_info->mount_crypt_stat);
 	kmem_cache_free(ecryptfs_sb_info_cache, sb_info);
 }
@@ -667,11 +660,6 @@ static struct ecryptfs_cache_info {
 		.name = "ecryptfs_file_cache",
 		.size = sizeof(struct ecryptfs_file_info),
 	},
-	{
-		.cache = &ecryptfs_dentry_info_cache,
-		.name = "ecryptfs_dentry_info_cache",
-		.size = sizeof(struct ecryptfs_dentry_info),
-	},
 	{
 		.cache = &ecryptfs_inode_info_cache,
 		.name = "ecryptfs_inode_cache",
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH 52/52] fs/namespace.c: sanitize descriptions for {__,}lookup_mnt()
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (49 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 51/52] ecryptfs: get rid of pointless mount references in ecryptfs dentries Al Viro
@ 2025-08-25  4:43   ` Al Viro
  2025-08-25 13:42     ` Christian Brauner
  2025-08-25 12:30   ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Christian Brauner
  51 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25  4:43 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Comments regarding "shadow mounts" were stale - no such thing anymore.
Document the locking requirements for __lookup_mnt().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 41 ++++++++++++-----------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 82cab5459ec7..538313b3b7d9 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -825,24 +825,16 @@ static bool legitimize_mnt(struct vfsmount *bastard, unsigned seq)
 }
 
 /**
- * __lookup_mnt - find first child mount
+ * __lookup_mnt - mount hash lookup
  * @mnt:	parent mount
- * @dentry:	mountpoint
+ * @dentry:	dentry of mountpoint
  *
- * If @mnt has a child mount @c mounted @dentry find and return it.
+ * If @mnt has a child mount @c mounted on @dentry find and return it.
+ * Caller must either hold the spinlock component of @mount_lock or
+ * hold rcu_read_lock(), sample the seqcount component before the call
+ * and recheck it afterwards.
  *
- * Note that the child mount @c need not be unique. There are cases
- * where shadow mounts are created. For example, during mount
- * propagation when a source mount @mnt whose root got overmounted by a
- * mount @o after path lookup but before @namespace_sem could be
- * acquired gets copied and propagated. So @mnt gets copied including
- * @o. When @mnt is propagated to a destination mount @d that already
- * has another mount @n mounted at the same mountpoint then the source
- * mount @mnt will be tucked beneath @n, i.e., @n will be mounted on
- * @mnt and @mnt mounted on @d. Now both @n and @o are mounted at @mnt
- * on @dentry.
- *
- * Return: The first child of @mnt mounted @dentry or NULL.
+ * Return: The child of @mnt mounted on @dentry or %NULL.
  */
 struct mount *__lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
 {
@@ -855,21 +847,12 @@ struct mount *__lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
 	return NULL;
 }
 
-/*
- * lookup_mnt - Return the first child mount mounted at path
- *
- * "First" means first mounted chronologically.  If you create the
- * following mounts:
- *
- * mount /dev/sda1 /mnt
- * mount /dev/sda2 /mnt
- * mount /dev/sda3 /mnt
- *
- * Then lookup_mnt() on the base /mnt dentry in the root mount will
- * return successively the root dentry and vfsmount of /dev/sda1, then
- * /dev/sda2, then /dev/sda3, then NULL.
+/**
+ * lookup_mnt - Return the child mount mounted at given location
+ * @path:	location in the namespace
  *
- * lookup_mnt takes a reference to the found vfsmount.
+ * Acquires and returns a new reference to mount at given location
+ * or %NULL if nothing is mounted there.
  */
 struct vfsmount *lookup_mnt(const struct path *path)
 {
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* Re: [PATCH 13/52] has_locked_children(): use guards
  2025-08-25  4:43   ` [PATCH 13/52] has_locked_children(): use guards Al Viro
@ 2025-08-25 11:54     ` Linus Torvalds
  2025-08-25 17:33       ` Al Viro
  2025-08-25 12:49     ` Christian Brauner
  1 sibling, 1 reply; 320+ messages in thread
From: Linus Torvalds @ 2025-08-25 11:54 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, brauner, jack

[ diff edited to be just the end result ]

On Mon, 25 Aug 2025 at 00:44, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>  bool has_locked_children(struct mount *mnt, struct dentry *dentry)
>  {
> +       scoped_guard(mount_locked_reader)
> +               return __has_locked_children(mnt, dentry);
>  }

So the use of scoped_guard() looks a bit odd to me. Why create a new
scope for when the existing scope is identical? It would seem to be
more straightforward to just do

        guard(mount_locked_reader);
        return __has_locked_children(mnt, dentry);

instead. Was there some code generation issue or other thing that made
you go the 'scoped' way?

There was at least one other patch that did the same pattern (but I
haven't gone through the whole series, maybe there are explanations
later).

               Linus

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 20/52] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount()
  2025-08-25  4:43   ` [PATCH 20/52] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() Al Viro
@ 2025-08-25 12:10     ` Linus Torvalds
  2025-08-25 12:17       ` Linus Torvalds
  2025-08-25 13:02     ` Christian Brauner
  1 sibling, 1 reply; 320+ messages in thread
From: Linus Torvalds @ 2025-08-25 12:10 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, brauner, jack

On Mon, 25 Aug 2025 at 00:45, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>                 if (beneath) {
>                         path_put(&under);
>                         read_seqlock_excl(&mount_lock);
> +                       if (unlikely(!mnt_has_parent(m))) {
> +                               read_sequnlock_excl(&mount_lock);
> +                               return -EINVAL;
> +                       }
>                         under.mnt = mntget(&m->mnt_parent->mnt);
>                         under.dentry = dget(m->mnt_mountpoint);
>                         read_sequnlock_excl(&mount_lock);

Well, *this* would look a lot cleaner with a
"scoped_guard(mount_locked_reader)", but you didn't do that for some
reason. Am I missing something?

              Linus

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 20/52] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount()
  2025-08-25 12:10     ` Linus Torvalds
@ 2025-08-25 12:17       ` Linus Torvalds
  0 siblings, 0 replies; 320+ messages in thread
From: Linus Torvalds @ 2025-08-25 12:17 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, brauner, jack

On Mon, 25 Aug 2025 at 08:10, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Well, *this* would look a lot cleaner with a
> "scoped_guard(mount_locked_reader)", but you didn't do that for some
> reason. Am I missing something?

Ahh. You rewrite it to look very different in 34/52.

            Linus

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-25  4:40 [PATCHED][RFC][CFT] mount-related stuff Al Viro
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
@ 2025-08-25 12:26 ` Christian Brauner
  2025-08-25 12:43 ` Christian Brauner
  2025-08-28 23:07 ` [PATCHES v2][RFC][CFT] " Al Viro
  3 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:26 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds, Jan Kara

So for fun I asked one of these A.I. tools wtf "CFT" actually means.
And I have to say it did not disappoint:

Looking at this Linux kernel mailing list context about mount-related
patches, "CFT" likely stands for "Call For Testing" in Al Viro's typical
terse style. But since you asked for alternative interpretations:

- Can't Find Testers
- Completely Funtested Trash  
- Christian's Frustration Trigger
- Cryptic Fileystem Torture
- Carefully Fabricated Terrorcode
- Code For Torvalds
- Chaotic Fs Tweaking
- Crash Friendly Technology
- Coffee Fueled Tinkering
- Confusing Fsdevel Tradition

I vote for "Carefully Fabricated Terrorcode".

On Mon, Aug 25, 2025 at 05:40:46AM +0100, Al Viro wrote:
> 	Most of this pile is basically an attempt to see how well do
> cleanup.h-style mechanisms apply in mount handling.  That stuff lives in
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.mount
> Rebased to -rc3 (used to be a bit past -rc2, branched at mount fixes merge)
> Individual patches in followups.
> 
> 	Please, help with review and testing.  It seems to survive the
> local beating and code generation seems to be OK, but more testing
> would be a good thing and I would really like to see comments on that
> stuff.
> 
> 	This is not all I've got around mount handling, but I'd rather
> get that thing out for review before starting to sort out other local
> mount-related branches.
> 
> 	Series overview:
> 
> 	Part 1: guards.
> 
> 	This part starts with infrastructure, followed by one-by-one
> conversions to the guard/scoped_guard in some of the places that fit
> that well enough.  Note that one of those places turned out to be taking
> mount_lock for no reason whatsoever; I already see places where we do
> write_seqlock when read_seqlock_excl would suffice, etc.
> 
> 	Folks, _please_ don't do any bulk conversions in that area.
> IMO one area where RAII becomes dangerous is locking; usually it's not
> a big deal to delay freeing some object a bit, but delay dropping a
> lock and you risk introducing deadlocks that will be bloody hard to spot.
> It _has_ to be done carefully; we had trouble in that area several times
> over the last year or so in fs/namespace.c alone.  Another fun problem
> is that quite a few comments regarding the locking in there are stale.
> We still have the comments that talk about mount lock as if it had been
> an rwlock-like thing.  It hadn't been that for more than a decade now.
> It needs to be documented sanely; so do the access rules to the data
> structures involved.  I hope to get some of that into the tree this cycle,
> but it's still in progress.
> 
> 1/52)  fs/namespace.c: fix the namespace_sem guard mess
> 	New guards: namespace_excl and namespace_shared.  The former implies
> the latter, as for anything rwsem-like.  No inode locks, no dropping the final
> references, no opening files, etc. in scope of those.
> 2/52)  introduced guards for mount_lock
> 	New guards: mount_writer, mount_locked_reader.  That's write_seqlock
> and read_seqlock_excl on mount_lock; obviously, nothing blocking should be
> done in scope of those.
> 3/52)  fs/namespace.c: allow to drop vfsmount references via __free(mntput)
> 	Missing DEFINE_FREE (for mntput()); local in fs/namespace.c, to be
> used only for keeping shit out of namespace_... and mount_... scopes.
> 4/52)  __detach_mounts(): use guards
> 5/52)  __is_local_mountpoint(): use guards
> 6/52)  do_change_type(): use guards
> 7/52)  do_set_group(): use guards
> 8/52)  mark_mounts_for_expiry(): use guards
> 9/52)  put_mnt_ns(): use guards
> 10/52)  mnt_already_visible(): use guards
> 	a bunch of clear-cut conversions, with explanations of the reasons
> why this or that guard is needed.
> 11/52)  check_for_nsfs_mounts(): no need to take locks
> 	... and here we have one where it turns out that locking had been
> excessive.  Iterating through a subtree in mount_locked_reader scope is
> safe, all right, but (1) mount_writer is not needed here at all and (2)
> namespace_shared + a reference held to the root of subtree is also enough.
> All callers had (2) already.  Documented the locking requirements for
> function, removed {,un}lock_mount_hash() in it...
> 12/52)  propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint()
> 	This one is interesting - existing code had been equivalent to
> scoped_guard(mount_locked_reader), and it's right for that call.  However,
> mnt_set_mountpoint() generally requires mount_writer - the only reason we
> get away with that here is that the mount in question never had been
> reachable from the mounts visible to other threads.
> 13/52)  has_locked_children(): use guards
> 14/52)  mnt_set_expiry(): use guards
> 15/52)  path_is_under(): use guards
> 	more clear-cut conversions with explanations.
> 16/52)  current_chrooted(): don't bother with follow_down_one()
> 17/52)  current_chrooted(): use guards
> 	this pair might be better off with #16 taken to the beginning
> of the series (or to a separate branch merge into this one); no better
> reason to do as I had than wanting to keep the guard infrastructure
> in the very beginning.
> 
> 	Part 2: turning unlock_mount() into __cleanup.
> 
> 	Environment for mounting something on given location consists of:
> 1) namespace_excl scope
> 2) parent mount - the one we'll be attaching things to.
> 3) mountpoint to be, protected from disappearing under us.
> 4) inode of that mountpoint's dentry held exclusive.
> 	Unfortunately, we can't take inode locks in namespace_excl scopes.
> And we want to cope with the possibility that somebody has managed to
> mount something on that place while we'd been taking locks.  "Cope" part
> is simple for finish_automount() ("drop our mount and go away quietly;
> somebody triggered it before we did"), but for everything else it's
> trickier - "use whatever's overmounting that place now (with the right
> locks, please)".
> 	lock_mount() does all of that (do_lock_mount(), actually), with
> unlock_mount() closing the scope.  And it's definitely a good candidate
> for __cleanup()-based approach, except that
> * the damn thing can return an error and conditional variants of that
> infrastructure are too revolting.
> * parent mount is returned in a fucking awful way - we modify the struct
> path passed to us as location to mount on and then its ->mnt is the parent
> to be... except for the "beneath" variant where we play convoluted games
> with "no, here we want the parent of that".  Implementation is also
> vulnerable to umount propagtion races.
> * the structure we set up (everything except the parent) is inserted
> into a linked list by lock_mount().  That excludes DEFINE_CLASS() -
> it wants the value formed and then copied to the variable we are
> defining.
> * it contains an implicit namespace_excl scope, so path_put() and its
> ilk *must* be done after the unlock_mount().  And most of the users have
> gotos past that.
> 	The first two problems are solved by adding an explicit pointer
> to parent mount into struct pinned_mountpoint.	Having lock_mount()
> failure reported by setting it to ERR_PTR(-E...) allows to avoid the
> problem with expressing the constructor failure.  The third one is dealt
> with by defining local macros to be used instead of CLASS - I went with
> LOCK_MOUNT(mp, path) which defines struct pinned_mountpoint mp with
> __cleanup(unlock_mount) and sets it up.  If anybody has better suggestions,
> I'll be glad to hear those.
> 	The last one is dealt with by massaging the users to form that
> would have all post-unlock_mount() stuff done by __free().
> 
> 	First, several trivial cleanups:
> 18/52)  do_move_mount(): trim local variables
> 19/52)  do_move_mount(): deal with the checks on old_path early
> 20/52)  move_mount(2): take sanity checks in 'beneath' case into do_lock_mount()
> 21/52)  finish_automount(): simplify the ELOOP check
> 
> 	Getting rid of post-unlock_mount() stuff:
> 22/52)  do_loopback(): use __free(path_put) to deal with old_path
> 23/52)  pivot_root(2): use __free() to deal with struct path in it
> 24/52)  finish_automount(): take the lock_mount() analogue into a helper
> 	this one turns the open-coded logics into lock_mount_exact() with
> the same kind of calling conventions as lock_mount() and do_lock_mount()
> 25/52)  do_new_mount_rc(): use __free() to deal with dropping mnt on failure
> 26/52)  finish_automount(): use __free() to deal with dropping mnt on failure
> 
> 	This is the main part:
> 27/52)  change calling conventions for lock_mount() et.al.
> 
> 	Followups, cleaning up the games with parent mount in the user:
> 28/52)  do_move_mount(): use the parent mount returned by do_lock_mount()
> 29/52)  do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path
> 30/52)  graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint
> 
> 	Part 3: getting rid of mutating struct path there.
> 
> 	do_lock_mount() is still playing silly buggers with struct path it
> had been given - the logics in that thing hadn't changed.  It's not a pretty
> function and it's racy as well; the thing is, by this point its users have
> almost no use for the changed contents of struct path - dentry can be derived
> from struct mountpoint, parent mount to use is provided directly and we
> want that a lot more than modified path->mnt.  There's only one place
> (in can_move_mount_beneath()) where we still want that and it's not hard
> to reconstruct the value by *original* path->mnt value + parent mount to
> be used.
> 
> 	Getting rid of ->dentry uses.
> 31/52)  pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry
> 32/52)  don't bother passing new_path->dentry to can_move_mount_beneath()
> 
> 	A helper, already open-coded in a couple of places; carved out of
> the next patch to keep it reasonably small
> 33/52)  new helper: topmost_overmount()
> 
> 	Rewrite of do_lock_mount() to keep path constant + trivial change
> in do_move_mount() to adjust the argument it passes to can_move_mount_beneath():
> 34/52)  do_lock_mount(): don't modify path.
> 	
> 
> 	Part 5: a bunch of trivial cleanups (mostly constifications)
> 
> 35/52)  constify check_mnt()
> 36/52)  do_mount_setattr(): constify path argument
> 37/52)  do_set_group(): constify path arguments
> 38/52)  drop_collected_paths(): constify arguments
> 39/52)  collect_paths(): constify the return value
> 40/52)  do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s)
> 41/52)  mnt_warn_timestamp_expiry(): constify struct path argument
> 42/52)  do_new_mount{,_fc}(): constify struct path argument
> 43/52)  do_{loopback,change_type,remount,reconfigure_mnt}(): constify struct path argument
> 44/52)  path_mount(): constify struct path argument
> 45/52)  may_copy_tree(), __do_loopback(): constify struct path argument
> 46/52)  path_umount(): constify struct path argument
> 47/52)  constify can_move_mount_beneath() arguments
> 48/52)  do_move_mount_old(): use __free(path_put)
> 49/52)  do_mount(): use __free(path_put)
> 
> 	Part 6: assorted stuff, will grow.
> 
> 50/52)  umount_tree(): take all victims out of propagation graph at once
> [had been earlier]
> 	For each removed mount we need to calculate where the slaves
> will end up.  To avoid duplicating that work, do it for all mounts to be
> removed at once, taking the mounts themselves out of propagation graph as
> we go, then do all transfers; the duplicate work on finding destinations
> is avoided since if we run into a mount that already had destination
> found, we don't need to trace the rest of the way.  That's guaranteed
> O(removed mounts) for finding destinations and removing from propagation
> graph and O(surviving mounts that have master removed) for transfers.
> 
> 51/52)  ecryptfs: get rid of pointless mount references in ecryptfs dentries
> 	->lower_path.mnt has the same value for all dentries on given
> ecryptfs instance and if somebody goes for mountpoint-crossing variant
> where that would not be true, we can deal with that when it happens
> (and _not_ with duplicating these reference into each dentry).
> 	As it is, we are better off just sticking a reference into
> ecryptfs-private part of superblock and keeping it pinned until
> ->kill_sb().
> 	That way we can stick a reference to underlying dentry right into
> ->d_fsdata of ecryptfs one, getting rid of indirection through struct
> ecryptfs_dentry_info, along with the entire struct ecryptfs_dentry_info
> machinery.
> 
> 52/52)  fs/namespace.c: sanitize descriptions for {__,}lookup_mnt()
> 	Comments regarding "shadow mounts" were stale - no such thing
> anymore.  Document the locking requirements for __lookup_mnt()...
> 
> 
> FWIW, the current diffstat:
> 
>  fs/ecryptfs/dentry.c          |  14 +-
>  fs/ecryptfs/ecryptfs_kernel.h |  27 +-
>  fs/ecryptfs/file.c            |  15 +-
>  fs/ecryptfs/inode.c           |  19 +-
>  fs/ecryptfs/main.c            |  24 +-
>  fs/internal.h                 |   4 +-
>  fs/mount.h                    |  12 +
>  fs/namespace.c                | 775 +++++++++++++++++++-----------------------
>  fs/pnode.c                    |  75 ++--
>  fs/pnode.h                    |   1 +
>  include/linux/mount.h         |   4 +-
>  kernel/audit_tree.c           |  12 +-
>  12 files changed, 464 insertions(+), 518 deletions(-)

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                     ` (50 preceding siblings ...)
  2025-08-25  4:43   ` [PATCH 52/52] fs/namespace.c: sanitize descriptions for {__,}lookup_mnt() Al Viro
@ 2025-08-25 12:30   ` Christian Brauner
  51 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:30 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:04AM +0100, Al Viro wrote:
> If anything, namespace_lock should be DEFINE_LOCK_GUARD_0, not DEFINE_GUARD.
> That way we
> 	* do not need to feed it a bogus argument
> 	* do not get gcc trying to compare an address of static in
> file variable with -4097 - and, if we are unlucky, trying to keep
> it in a register, with spills and all such.
> 
> The same problems apply to grabbing namespace_sem shared.
> 
> Rename it to namespace_excl, add namespace_shared, convert the existing users:
> 
>     guard(namespace_lock, &namespace_sem) => guard(namespace_excl)()
>     guard(rwsem_read, &namespace_sem) => guard(namespace_shared)()
>     scoped_guard(namespace_lock, &namespace_sem) => scoped_guard(namespace_excl)
>     scoped_guard(rwsem_read, &namespace_sem) => scoped_guard(namespace_shared)
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 02/52] introduced guards for mount_lock
  2025-08-25  4:43   ` [PATCH 02/52] introduced guards for mount_lock Al Viro
@ 2025-08-25 12:32     ` Christian Brauner
  2025-08-25 13:46       ` Al Viro
  0 siblings, 1 reply; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:32 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:05AM +0100, Al Viro wrote:
> mount_writer: write_seqlock; that's an equivalent of {un,}lock_mount_hash()
> mount_locked_reader: read_seqlock_excl; these tend to be open-coded.

Do we really need the "locked" midfix in there? Doesn't seem to buy any
clarity. I'd drop it so the naming is nicely consistent.

> 
> No bulk conversions, please - if nothing else, quite a few places take
> use mount_writer form when mount_locked_reader is sufficent.  It needs
> to be dealt with carefully.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/mount.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/mount.h b/fs/mount.h
> index 97737051a8b9..ed8c83ba836a 100644
> --- a/fs/mount.h
> +++ b/fs/mount.h
> @@ -154,6 +154,11 @@ static inline void get_mnt_ns(struct mnt_namespace *ns)
>  
>  extern seqlock_t mount_lock;
>  
> +DEFINE_LOCK_GUARD_0(mount_writer, write_seqlock(&mount_lock),
> +		    write_sequnlock(&mount_lock))
> +DEFINE_LOCK_GUARD_0(mount_locked_reader, read_seqlock_excl(&mount_lock),
> +		    read_sequnlock_excl(&mount_lock))
> +
>  struct proc_mounts {
>  	struct mnt_namespace *ns;
>  	struct path root;
> -- 
> 2.47.2
> 

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 03/52] fs/namespace.c: allow to drop vfsmount references via __free(mntput)
  2025-08-25  4:43   ` [PATCH 03/52] fs/namespace.c: allow to drop vfsmount references via __free(mntput) Al Viro
@ 2025-08-25 12:33     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:33 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:06AM +0100, Al Viro wrote:
> Note that just as path_put, it should never be done in scope of
> namespace_sem, be it shared or exclusive.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 04/52] __detach_mounts(): use guards
  2025-08-25  4:43   ` [PATCH 04/52] __detach_mounts(): use guards Al Viro
@ 2025-08-25 12:33     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:33 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:07AM +0100, Al Viro wrote:
> Clean fit for guards use; guards can't be weaker due to umount_tree() calls.
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 05/52] __is_local_mountpoint(): use guards
  2025-08-25  4:43   ` [PATCH 05/52] __is_local_mountpoint(): " Al Viro
@ 2025-08-25 12:33     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:33 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:08AM +0100, Al Viro wrote:
> clean fit; namespace_shared due to iterating through ns->mounts.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 06/52] do_change_type(): use guards
  2025-08-25  4:43   ` [PATCH 06/52] do_change_type(): " Al Viro
@ 2025-08-25 12:34     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:34 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:09AM +0100, Al Viro wrote:
> clean fit; namespace_excl to modify propagation graph
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 07/52] do_set_group(): use guards
  2025-08-25  4:43   ` [PATCH 07/52] do_set_group(): " Al Viro
@ 2025-08-25 12:35     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:35 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:10AM +0100, Al Viro wrote:
> clean fit; namespace_excl to modify propagation graph
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 08/52] mark_mounts_for_expiry(): use guards
  2025-08-25  4:43   ` [PATCH 08/52] mark_mounts_for_expiry(): " Al Viro
@ 2025-08-25 12:37     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:11AM +0100, Al Viro wrote:
> Clean fit; guards can't be weaker due to umount_tree() calls.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 09/52] put_mnt_ns(): use guards
  2025-08-25  4:43   ` [PATCH 09/52] put_mnt_ns(): " Al Viro
@ 2025-08-25 12:37     ` Christian Brauner
  2025-08-25 12:40     ` Christian Brauner
  1 sibling, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:12AM +0100, Al Viro wrote:
> clean fit; guards can't be weaker due to umount_tree() call.
> Setting emptied_ns requires namespace_excl, but not anything
> mount_lock-related.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 10/52] mnt_already_visible(): use guards
  2025-08-25  4:43   ` [PATCH 10/52] mnt_already_visible(): " Al Viro
@ 2025-08-25 12:39     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:39 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:13AM +0100, Al Viro wrote:
> clean fit; namespace_shared due to iterating through ns->mounts.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 09/52] put_mnt_ns(): use guards
  2025-08-25  4:43   ` [PATCH 09/52] put_mnt_ns(): " Al Viro
  2025-08-25 12:37     ` Christian Brauner
@ 2025-08-25 12:40     ` Christian Brauner
  2025-08-25 16:21       ` Al Viro
  1 sibling, 1 reply; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:40 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:12AM +0100, Al Viro wrote:
> clean fit; guards can't be weaker due to umount_tree() call.
> Setting emptied_ns requires namespace_excl, but not anything
> mount_lock-related.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/namespace.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 898a6b7307e4..86a86be2b0ef 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -6153,12 +6153,10 @@ void put_mnt_ns(struct mnt_namespace *ns)
>  {
>  	if (!refcount_dec_and_test(&ns->ns.count))
>  		return;
> -	namespace_lock();
> +	guard(namespace_excl)();
>  	emptied_ns = ns;

Another thing, did I miss

commit aab771f34e63ef89e195b63d121abcb55eebfde6
Author:     Al Viro <viro@zeniv.linux.org.uk>
AuthorDate: Wed Jun 18 18:23:41 2025 -0400
Commit:     Al Viro <viro@zeniv.linux.org.uk>
CommitDate: Sun Jun 29 19:03:46 2025 -0400

    take freeing of emptied mnt_namespace to namespace_unlock()

on the list somehow? I just saw that "emptied_ns" thing for the first
time and was very confused where that came from. I don't see any lore
link attached to the commit message.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-25  4:40 [PATCHED][RFC][CFT] mount-related stuff Al Viro
  2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
  2025-08-25 12:26 ` [PATCHED][RFC][CFT] mount-related stuff Christian Brauner
@ 2025-08-25 12:43 ` Christian Brauner
  2025-08-25 16:11   ` Al Viro
  2025-08-28 23:07 ` [PATCHES v2][RFC][CFT] " Al Viro
  3 siblings, 1 reply; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds, Jan Kara

On Mon, Aug 25, 2025 at 05:40:46AM +0100, Al Viro wrote:
> 	Most of this pile is basically an attempt to see how well do
> cleanup.h-style mechanisms apply in mount handling.  That stuff lives in
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.mount
> Rebased to -rc3 (used to be a bit past -rc2, branched at mount fixes merge)
> Individual patches in followups.
> 
> 	Please, help with review and testing.  It seems to survive the
> local beating and code generation seems to be OK, but more testing
> would be a good thing and I would really like to see comments on that
> stuff.

Btw, I just realized that basically none of your commits have any lore
links in them. That kinda sucks because I very very often just look at a
commit and then use the link to jump to the mailing list discussion for
more context about a change and how it came about.

So pretty please can you start adding lore links to your commits when
applying if it's not fucking up your workflow too much?

> 
> 	This is not all I've got around mount handling, but I'd rather
> get that thing out for review before starting to sort out other local
> mount-related branches.
> 
> 	Series overview:
> 
> 	Part 1: guards.
> 
> 	This part starts with infrastructure, followed by one-by-one
> conversions to the guard/scoped_guard in some of the places that fit
> that well enough.  Note that one of those places turned out to be taking
> mount_lock for no reason whatsoever; I already see places where we do
> write_seqlock when read_seqlock_excl would suffice, etc.
> 
> 	Folks, _please_ don't do any bulk conversions in that area.
> IMO one area where RAII becomes dangerous is locking; usually it's not
> a big deal to delay freeing some object a bit, but delay dropping a
> lock and you risk introducing deadlocks that will be bloody hard to spot.
> It _has_ to be done carefully; we had trouble in that area several times
> over the last year or so in fs/namespace.c alone.  Another fun problem
> is that quite a few comments regarding the locking in there are stale.
> We still have the comments that talk about mount lock as if it had been
> an rwlock-like thing.  It hadn't been that for more than a decade now.
> It needs to be documented sanely; so do the access rules to the data
> structures involved.  I hope to get some of that into the tree this cycle,
> but it's still in progress.
> 
> 1/52)  fs/namespace.c: fix the namespace_sem guard mess
> 	New guards: namespace_excl and namespace_shared.  The former implies
> the latter, as for anything rwsem-like.  No inode locks, no dropping the final
> references, no opening files, etc. in scope of those.
> 2/52)  introduced guards for mount_lock
> 	New guards: mount_writer, mount_locked_reader.  That's write_seqlock
> and read_seqlock_excl on mount_lock; obviously, nothing blocking should be
> done in scope of those.
> 3/52)  fs/namespace.c: allow to drop vfsmount references via __free(mntput)
> 	Missing DEFINE_FREE (for mntput()); local in fs/namespace.c, to be
> used only for keeping shit out of namespace_... and mount_... scopes.
> 4/52)  __detach_mounts(): use guards
> 5/52)  __is_local_mountpoint(): use guards
> 6/52)  do_change_type(): use guards
> 7/52)  do_set_group(): use guards
> 8/52)  mark_mounts_for_expiry(): use guards
> 9/52)  put_mnt_ns(): use guards
> 10/52)  mnt_already_visible(): use guards
> 	a bunch of clear-cut conversions, with explanations of the reasons
> why this or that guard is needed.
> 11/52)  check_for_nsfs_mounts(): no need to take locks
> 	... and here we have one where it turns out that locking had been
> excessive.  Iterating through a subtree in mount_locked_reader scope is
> safe, all right, but (1) mount_writer is not needed here at all and (2)
> namespace_shared + a reference held to the root of subtree is also enough.
> All callers had (2) already.  Documented the locking requirements for
> function, removed {,un}lock_mount_hash() in it...
> 12/52)  propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint()
> 	This one is interesting - existing code had been equivalent to
> scoped_guard(mount_locked_reader), and it's right for that call.  However,
> mnt_set_mountpoint() generally requires mount_writer - the only reason we
> get away with that here is that the mount in question never had been
> reachable from the mounts visible to other threads.
> 13/52)  has_locked_children(): use guards
> 14/52)  mnt_set_expiry(): use guards
> 15/52)  path_is_under(): use guards
> 	more clear-cut conversions with explanations.
> 16/52)  current_chrooted(): don't bother with follow_down_one()
> 17/52)  current_chrooted(): use guards
> 	this pair might be better off with #16 taken to the beginning
> of the series (or to a separate branch merge into this one); no better
> reason to do as I had than wanting to keep the guard infrastructure
> in the very beginning.
> 
> 	Part 2: turning unlock_mount() into __cleanup.
> 
> 	Environment for mounting something on given location consists of:
> 1) namespace_excl scope
> 2) parent mount - the one we'll be attaching things to.
> 3) mountpoint to be, protected from disappearing under us.
> 4) inode of that mountpoint's dentry held exclusive.
> 	Unfortunately, we can't take inode locks in namespace_excl scopes.
> And we want to cope with the possibility that somebody has managed to
> mount something on that place while we'd been taking locks.  "Cope" part
> is simple for finish_automount() ("drop our mount and go away quietly;
> somebody triggered it before we did"), but for everything else it's
> trickier - "use whatever's overmounting that place now (with the right
> locks, please)".
> 	lock_mount() does all of that (do_lock_mount(), actually), with
> unlock_mount() closing the scope.  And it's definitely a good candidate
> for __cleanup()-based approach, except that
> * the damn thing can return an error and conditional variants of that
> infrastructure are too revolting.
> * parent mount is returned in a fucking awful way - we modify the struct
> path passed to us as location to mount on and then its ->mnt is the parent
> to be... except for the "beneath" variant where we play convoluted games
> with "no, here we want the parent of that".  Implementation is also
> vulnerable to umount propagtion races.
> * the structure we set up (everything except the parent) is inserted
> into a linked list by lock_mount().  That excludes DEFINE_CLASS() -
> it wants the value formed and then copied to the variable we are
> defining.
> * it contains an implicit namespace_excl scope, so path_put() and its
> ilk *must* be done after the unlock_mount().  And most of the users have
> gotos past that.
> 	The first two problems are solved by adding an explicit pointer
> to parent mount into struct pinned_mountpoint.	Having lock_mount()
> failure reported by setting it to ERR_PTR(-E...) allows to avoid the
> problem with expressing the constructor failure.  The third one is dealt
> with by defining local macros to be used instead of CLASS - I went with
> LOCK_MOUNT(mp, path) which defines struct pinned_mountpoint mp with
> __cleanup(unlock_mount) and sets it up.  If anybody has better suggestions,
> I'll be glad to hear those.
> 	The last one is dealt with by massaging the users to form that
> would have all post-unlock_mount() stuff done by __free().
> 
> 	First, several trivial cleanups:
> 18/52)  do_move_mount(): trim local variables
> 19/52)  do_move_mount(): deal with the checks on old_path early
> 20/52)  move_mount(2): take sanity checks in 'beneath' case into do_lock_mount()
> 21/52)  finish_automount(): simplify the ELOOP check
> 
> 	Getting rid of post-unlock_mount() stuff:
> 22/52)  do_loopback(): use __free(path_put) to deal with old_path
> 23/52)  pivot_root(2): use __free() to deal with struct path in it
> 24/52)  finish_automount(): take the lock_mount() analogue into a helper
> 	this one turns the open-coded logics into lock_mount_exact() with
> the same kind of calling conventions as lock_mount() and do_lock_mount()
> 25/52)  do_new_mount_rc(): use __free() to deal with dropping mnt on failure
> 26/52)  finish_automount(): use __free() to deal with dropping mnt on failure
> 
> 	This is the main part:
> 27/52)  change calling conventions for lock_mount() et.al.
> 
> 	Followups, cleaning up the games with parent mount in the user:
> 28/52)  do_move_mount(): use the parent mount returned by do_lock_mount()
> 29/52)  do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path
> 30/52)  graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint
> 
> 	Part 3: getting rid of mutating struct path there.
> 
> 	do_lock_mount() is still playing silly buggers with struct path it
> had been given - the logics in that thing hadn't changed.  It's not a pretty
> function and it's racy as well; the thing is, by this point its users have
> almost no use for the changed contents of struct path - dentry can be derived
> from struct mountpoint, parent mount to use is provided directly and we
> want that a lot more than modified path->mnt.  There's only one place
> (in can_move_mount_beneath()) where we still want that and it's not hard
> to reconstruct the value by *original* path->mnt value + parent mount to
> be used.
> 
> 	Getting rid of ->dentry uses.
> 31/52)  pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry
> 32/52)  don't bother passing new_path->dentry to can_move_mount_beneath()
> 
> 	A helper, already open-coded in a couple of places; carved out of
> the next patch to keep it reasonably small
> 33/52)  new helper: topmost_overmount()
> 
> 	Rewrite of do_lock_mount() to keep path constant + trivial change
> in do_move_mount() to adjust the argument it passes to can_move_mount_beneath():
> 34/52)  do_lock_mount(): don't modify path.
> 	
> 
> 	Part 5: a bunch of trivial cleanups (mostly constifications)
> 
> 35/52)  constify check_mnt()
> 36/52)  do_mount_setattr(): constify path argument
> 37/52)  do_set_group(): constify path arguments
> 38/52)  drop_collected_paths(): constify arguments
> 39/52)  collect_paths(): constify the return value
> 40/52)  do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s)
> 41/52)  mnt_warn_timestamp_expiry(): constify struct path argument
> 42/52)  do_new_mount{,_fc}(): constify struct path argument
> 43/52)  do_{loopback,change_type,remount,reconfigure_mnt}(): constify struct path argument
> 44/52)  path_mount(): constify struct path argument
> 45/52)  may_copy_tree(), __do_loopback(): constify struct path argument
> 46/52)  path_umount(): constify struct path argument
> 47/52)  constify can_move_mount_beneath() arguments
> 48/52)  do_move_mount_old(): use __free(path_put)
> 49/52)  do_mount(): use __free(path_put)
> 
> 	Part 6: assorted stuff, will grow.
> 
> 50/52)  umount_tree(): take all victims out of propagation graph at once
> [had been earlier]
> 	For each removed mount we need to calculate where the slaves
> will end up.  To avoid duplicating that work, do it for all mounts to be
> removed at once, taking the mounts themselves out of propagation graph as
> we go, then do all transfers; the duplicate work on finding destinations
> is avoided since if we run into a mount that already had destination
> found, we don't need to trace the rest of the way.  That's guaranteed
> O(removed mounts) for finding destinations and removing from propagation
> graph and O(surviving mounts that have master removed) for transfers.
> 
> 51/52)  ecryptfs: get rid of pointless mount references in ecryptfs dentries
> 	->lower_path.mnt has the same value for all dentries on given
> ecryptfs instance and if somebody goes for mountpoint-crossing variant
> where that would not be true, we can deal with that when it happens
> (and _not_ with duplicating these reference into each dentry).
> 	As it is, we are better off just sticking a reference into
> ecryptfs-private part of superblock and keeping it pinned until
> ->kill_sb().
> 	That way we can stick a reference to underlying dentry right into
> ->d_fsdata of ecryptfs one, getting rid of indirection through struct
> ecryptfs_dentry_info, along with the entire struct ecryptfs_dentry_info
> machinery.
> 
> 52/52)  fs/namespace.c: sanitize descriptions for {__,}lookup_mnt()
> 	Comments regarding "shadow mounts" were stale - no such thing
> anymore.  Document the locking requirements for __lookup_mnt()...
> 
> 
> FWIW, the current diffstat:
> 
>  fs/ecryptfs/dentry.c          |  14 +-
>  fs/ecryptfs/ecryptfs_kernel.h |  27 +-
>  fs/ecryptfs/file.c            |  15 +-
>  fs/ecryptfs/inode.c           |  19 +-
>  fs/ecryptfs/main.c            |  24 +-
>  fs/internal.h                 |   4 +-
>  fs/mount.h                    |  12 +
>  fs/namespace.c                | 775 +++++++++++++++++++-----------------------
>  fs/pnode.c                    |  75 ++--
>  fs/pnode.h                    |   1 +
>  include/linux/mount.h         |   4 +-
>  kernel/audit_tree.c           |  12 +-
>  12 files changed, 464 insertions(+), 518 deletions(-)

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 11/52] check_for_nsfs_mounts(): no need to take locks
  2025-08-25  4:43   ` [PATCH 11/52] check_for_nsfs_mounts(): no need to take locks Al Viro
@ 2025-08-25 12:48     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:48 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:14AM +0100, Al Viro wrote:
> Currently we are taking mount_writer; what that function needs is
> either mount_locked_reader (we are not changing anything, we just
> want to iterate through the subtree) or namespace_shared and
> a reference held by caller on the root of subtree - that's also
> enough to stabilize the topology.
> 
> The thing is, all callers are already holding at least namespace_shared
> as well as a reference to the root of subtree.
> 
> Let's make the callers provide locking warranties - don't mess with
> mount_lock in check_for_nsfs_mounts() itself and document the locking
> requirements.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 12/52] propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint()
  2025-08-25  4:43   ` [PATCH 12/52] propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint() Al Viro
@ 2025-08-25 12:49     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:49 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:15AM +0100, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 13/52] has_locked_children(): use guards
  2025-08-25  4:43   ` [PATCH 13/52] has_locked_children(): use guards Al Viro
  2025-08-25 11:54     ` Linus Torvalds
@ 2025-08-25 12:49     ` Christian Brauner
  1 sibling, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:49 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:16AM +0100, Al Viro wrote:
> ... and document the locking requirements of __has_locked_children()
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/namespace.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 59948cbf9c47..eabb0d996c6a 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2373,6 +2373,7 @@ void dissolve_on_fput(struct vfsmount *mnt)
>  	}
>  }
>  
> +/* locks: namespace_shared && pinned(mnt) || mount_locked_reader */
>  static bool __has_locked_children(struct mount *mnt, struct dentry *dentry)
>  {
>  	struct mount *child;
> @@ -2389,12 +2390,8 @@ static bool __has_locked_children(struct mount *mnt, struct dentry *dentry)
>  
>  bool has_locked_children(struct mount *mnt, struct dentry *dentry)
>  {
> -	bool res;
> -
> -	read_seqlock_excl(&mount_lock);
> -	res = __has_locked_children(mnt, dentry);
> -	read_sequnlock_excl(&mount_lock);
> -	return res;
> +	scoped_guard(mount_locked_reader)
> +		return __has_locked_children(mnt, dentry);

Agree with Linus, this should just use a plain guard().

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 14/52] mnt_set_expiry(): use guards
  2025-08-25  4:43   ` [PATCH 14/52] mnt_set_expiry(): " Al Viro
@ 2025-08-25 12:51     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:51 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:17AM +0100, Al Viro wrote:
> The reason why it needs only mount_locked_reader is that there's no lockless
> accesses of expiry lists.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/namespace.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index eabb0d996c6a..acacfe767a7c 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3858,9 +3858,8 @@ int finish_automount(struct vfsmount *m, const struct path *path)
>   */
>  void mnt_set_expiry(struct vfsmount *mnt, struct list_head *expiry_list)
>  {
> -	read_seqlock_excl(&mount_lock);
> -	list_add_tail(&real_mount(mnt)->mnt_expire, expiry_list);
> -	read_sequnlock_excl(&mount_lock);
> +	scoped_guard(mount_locked_reader)
> +		list_add_tail(&real_mount(mnt)->mnt_expire, expiry_list);

Should also just use a guard(). I don't think religiously sticking to
scoped_guard() out of conceptual aversion to guard() buys us anything.
It's cleaner to read in such short functions very clearly.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 15/52] path_is_under(): use guards
  2025-08-25  4:43   ` [PATCH 15/52] path_is_under(): " Al Viro
@ 2025-08-25 12:56     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:56 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:18AM +0100, Al Viro wrote:
> ... and document that locking requirements for is_path_reachable().
> There is one questionable caller in do_listmount() where we are not
> holding mount_lock *and* might not have the first argument mounted.
> However, in that case it will immediately return true without having
> to look at the ancestors.  Might be cleaner to move the check into
> non-LSTM_ROOT case which it really belongs in - there the check is
> not always true and is_mounted() is guaranteed.
> 
> Document the locking environments for is_path_reachable() callers:
> 	get_peer_under_root()
> 	get_dominating_id()
> 	do_statmount()
> 	do_listmount()
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

>  fs/namespace.c | 12 ++++++------
>  fs/pnode.c     |  3 ++-
>  2 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index acacfe767a7c..bf9a3a644faa 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -4592,7 +4592,7 @@ SYSCALL_DEFINE5(move_mount,
>  /*
>   * Return true if path is reachable from root
>   *
> - * namespace_sem or mount_lock is held
> + * locks: mount_locked_reader || namespace_shared && is_mounted(mnt)
>   */
>  bool is_path_reachable(struct mount *mnt, struct dentry *dentry,
>  			 const struct path *root)
> @@ -4606,11 +4606,9 @@ bool is_path_reachable(struct mount *mnt, struct dentry *dentry,
>  
>  bool path_is_under(const struct path *path1, const struct path *path2)
>  {
> -	bool res;
> -	read_seqlock_excl(&mount_lock);
> -	res = is_path_reachable(real_mount(path1->mnt), path1->dentry, path2);
> -	read_sequnlock_excl(&mount_lock);
> -	return res;
> +	scoped_guard(mount_locked_reader)
> +		return is_path_reachable(real_mount(path1->mnt), path1->dentry,
> +					 path2);

Same thing, no need for this scoped guard eyesore.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 16/52] current_chrooted(): don't bother with follow_down_one()
  2025-08-25  4:43   ` [PATCH 16/52] current_chrooted(): don't bother with follow_down_one() Al Viro
@ 2025-08-25 12:57     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:57 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:19AM +0100, Al Viro wrote:
> All we need here is to follow ->overmount on root mount of namespace...
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 17/52] current_chrooted(): use guards
  2025-08-25  4:43   ` [PATCH 17/52] current_chrooted(): use guards Al Viro
@ 2025-08-25 12:57     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:57 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:20AM +0100, Al Viro wrote:
> here a use of __free(path_put) for dropping fs_root is enough to
> make guard(mount_locked_reader) fit...
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 18/52] do_move_mount(): trim local variables
  2025-08-25  4:43   ` [PATCH 18/52] do_move_mount(): trim local variables Al Viro
@ 2025-08-25 12:57     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 12:57 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:21AM +0100, Al Viro wrote:
> Both 'parent' and 'ns' are used at most once, no point precalculating those...
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

>  fs/namespace.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index a8b586e635d8..1a076aac5d73 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3564,10 +3564,8 @@ static inline bool may_use_mount(struct mount *mnt)
>  static int do_move_mount(struct path *old_path,
>  			 struct path *new_path, enum mnt_tree_flags_t flags)
>  {
> -	struct mnt_namespace *ns;
>  	struct mount *p;
>  	struct mount *old;
> -	struct mount *parent;
>  	struct pinned_mountpoint mp;
>  	int err;
>  	bool beneath = flags & MNT_TREE_BENEATH;
> @@ -3578,8 +3576,6 @@ static int do_move_mount(struct path *old_path,
>  
>  	old = real_mount(old_path->mnt);
>  	p = real_mount(new_path->mnt);
> -	parent = old->mnt_parent;
> -	ns = old->mnt_ns;
>  
>  	err = -EINVAL;
>  
> @@ -3588,12 +3584,12 @@ static int do_move_mount(struct path *old_path,
>  		/* ... it should be detachable from parent */
>  		if (!mnt_has_parent(old) || IS_MNT_LOCKED(old))
>  			goto out;
> +		/* ... which should not be shared */
> +		if (IS_MNT_SHARED(old->mnt_parent))
> +			goto out;
>  		/* ... and the target should be in our namespace */
>  		if (!check_mnt(p))
>  			goto out;
> -		/* parent of the source should not be shared */
> -		if (IS_MNT_SHARED(parent))
> -			goto out;
>  	} else {
>  		/*
>  		 * otherwise the source must be the root of some anon namespace.
> @@ -3605,7 +3601,7 @@ static int do_move_mount(struct path *old_path,
>  		 * subsequent checks would've rejected that, but they lose
>  		 * some corner cases if we check it early.
>  		 */
> -		if (ns == p->mnt_ns)
> +		if (old->mnt_ns == p->mnt_ns)
>  			goto out;
>  		/*
>  		 * Target should be either in our namespace or in an acceptable
> -- 
> 2.47.2
> 

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 19/52] do_move_mount(): deal with the checks on old_path early
  2025-08-25  4:43   ` [PATCH 19/52] do_move_mount(): deal with the checks on old_path early Al Viro
@ 2025-08-25 13:00     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:00 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:22AM +0100, Al Viro wrote:
> 1) checking that location we want to move does point to root of some mount
> can be done before anything else; that property is not going to change
> and having it already verified simplifies the analysis.
> 
> 2) checking the type agreement between what we are trying to move and what
> we are trying to move it onto also belongs in the very beginning -
> do_lock_mount() might end up switching new_path to something that overmounts
> the original location, but... the same type agreement applies to overmounts,
> so we could just as well check against the original location.
> 
> 3) since we know that old_path->dentry is the root of old_path->mnt, there's
> no point bothering with path_is_overmounted() in can_move_mount_beneath();
> it's simply a check for the mount we are trying to move having non-NULL
> ->overmount.  And with that, we can switch can_move_mount_beneath() to
> taking old instead of old_path, leaving no uses of old_path past the original
> checks.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 20/52] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount()
  2025-08-25  4:43   ` [PATCH 20/52] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() Al Viro
  2025-08-25 12:10     ` Linus Torvalds
@ 2025-08-25 13:02     ` Christian Brauner
  2025-08-25 16:05       ` Al Viro
  1 sibling, 1 reply; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:02 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:23AM +0100, Al Viro wrote:
> We want to mount beneath the given location.  For that operation to
> make sense, location must be the root of some mount that has something
> under it.  Currently we let it proceed if those requirements are not met,
> with rather meaningless results, and have that bogosity caught further
> down the road; let's fail early instead - do_lock_mount() doesn't make
> sense unless those conditions hold, and checking them there makes
> things simpler.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Well, do_lock_mount() was already convoluted enough that didn't want
that in there as well. But I don't care,

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 21/52] finish_automount(): simplify the ELOOP check
  2025-08-25  4:43   ` [PATCH 21/52] finish_automount(): simplify the ELOOP check Al Viro
@ 2025-08-25 13:02     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:02 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:24AM +0100, Al Viro wrote:
> It's enough to check that dentries match; if path->dentry is equal to
> m->mnt_root, superblocks will match as well.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 22/52] do_loopback(): use __free(path_put) to deal with old_path
  2025-08-25  4:43   ` [PATCH 22/52] do_loopback(): use __free(path_put) to deal with old_path Al Viro
@ 2025-08-25 13:02     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:02 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:25AM +0100, Al Viro wrote:
> preparations for making unlock_mount() a __cleanup();
> can't have path_put() inside mount_lock scope.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 23/52] pivot_root(2): use __free() to deal with struct path in it
  2025-08-25  4:43   ` [PATCH 23/52] pivot_root(2): use __free() to deal with struct path in it Al Viro
@ 2025-08-25 13:03     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:03 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:26AM +0100, Al Viro wrote:
> preparations for making unlock_mount() a __cleanup();
> can't have path_put() inside mount_lock scope.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 24/52] finish_automount(): take the lock_mount() analogue into a helper
  2025-08-25  4:43   ` [PATCH 24/52] finish_automount(): take the lock_mount() analogue into a helper Al Viro
@ 2025-08-25 13:08     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:08 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:27AM +0100, Al Viro wrote:
> finish_automount() can't use lock_mount() - it treats finding something
> already mounted as "quitely drop our mount and return 0", not as
> "mount on top of whatever mounted there".  It's been open-coded;
> let's take it into a helper similar to lock_mount().  "something's
> already mounted" => -EBUSY, finish_automount() needs to distinguish
> it from the normal case and it can't happen in other failure cases.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

>  fs/namespace.c | 42 +++++++++++++++++++++++++-----------------
>  1 file changed, 25 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 892251663419..99757040a39a 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3786,9 +3786,29 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
>  	return err;
>  }
>  
> -int finish_automount(struct vfsmount *m, const struct path *path)
> +static int lock_mount_exact(const struct path *path,
> +			    struct pinned_mountpoint *mp)
>  {
>  	struct dentry *dentry = path->dentry;
> +	int err;
> +
> +	inode_lock(dentry->d_inode);
> +	namespace_lock();
> +	if (unlikely(cant_mount(dentry)))
> +		err = -ENOENT;
> +	else if (path_overmounted(path))
> +		err = -EBUSY;
> +	else
> +		err = get_mountpoint(dentry, mp);
> +	if (unlikely(err)) {
> +		namespace_unlock();
> +		inode_unlock(dentry->d_inode);
> +	}
> +	return err;
> +}
> +
> +int finish_automount(struct vfsmount *m, const struct path *path)
> +{
>  	struct pinned_mountpoint mp = {};
>  	struct mount *mnt;
>  	int err;
> @@ -3810,20 +3830,11 @@ int finish_automount(struct vfsmount *m, const struct path *path)
>  	 * that overmounts our mountpoint to be means "quitely drop what we've
>  	 * got", not "try to mount it on top".
>  	 */
> -	inode_lock(dentry->d_inode);
> -	namespace_lock();
> -	if (unlikely(cant_mount(dentry))) {
> -		err = -ENOENT;
> -		goto discard_locked;
> -	}
> -	if (path_overmounted(path)) {
> -		err = 0;
> -		goto discard_locked;
> +	err = lock_mount_exact(path, &mp);
> +	if (unlikely(err)) {
> +		mntput(m);
> +		return err == -EBUSY ? 0 : err;
>  	}
> -	err = get_mountpoint(dentry, &mp);
> -	if (err)
> -		goto discard_locked;
> -
>  	err = do_add_mount(mnt, mp.mp, path,
>  			   path->mnt->mnt_flags | MNT_SHRINKABLE);
>  	unlock_mount(&mp);
> @@ -3831,9 +3842,6 @@ int finish_automount(struct vfsmount *m, const struct path *path)
>  		goto discard;
>  	return 0;
>  
> -discard_locked:
> -	namespace_unlock();
> -	inode_unlock(dentry->d_inode);
>  discard:
>  	mntput(m);

Can use direct returns if you do:

        struct mount *mnt __free(mntput) = NULL;

and then in the success condition:

        retain_and_null_ptr(mnt);

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 26/52] finish_automount(): use __free() to deal with dropping mnt on failure
  2025-08-25  4:43   ` [PATCH 26/52] finish_automount(): use __free() to deal with dropping mnt on failure Al Viro
@ 2025-08-25 13:09     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:09 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:29AM +0100, Al Viro wrote:
> same story as with do_new_mount_fc().
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Ah right, here it is what I suggested earlier,

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 25/52] do_new_mount_rc(): use __free() to deal with dropping mnt on failure
  2025-08-25  4:43   ` [PATCH 25/52] do_new_mount_rc(): use __free() to deal with dropping mnt on failure Al Viro
@ 2025-08-25 13:29     ` Christian Brauner
  2025-08-25 16:09       ` Al Viro
  0 siblings, 1 reply; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:29 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:28AM +0100, Al Viro wrote:
> do_add_mount() consumes vfsmount on success; just follow it with
> conditional retain_and_null_ptr() on success and we can switch
> to __free() for mnt and be done with that - unlock_mount() is
> in the very end.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---


>  fs/namespace.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 99757040a39a..79c87937a7dd 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3694,7 +3694,6 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
>  static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
>  			   unsigned int mnt_flags)
>  {
> -	struct vfsmount *mnt;
>  	struct pinned_mountpoint mp = {};
>  	struct super_block *sb = fc->root->d_sb;
>  	int error;
> @@ -3710,7 +3709,7 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
>  
>  	up_write(&sb->s_umount);
>  
> -	mnt = vfs_create_mount(fc);
> +	struct vfsmount *mnt __free(mntput) = vfs_create_mount(fc);

Ugh, can we please not start declaring variables in the middle of a
scope.

>  	if (IS_ERR(mnt))
>  		return PTR_ERR(mnt);
>  
> @@ -3720,10 +3719,10 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
>  	if (!error) {
>  		error = do_add_mount(real_mount(mnt), mp.mp,
>  				     mountpoint, mnt_flags);
> +		if (!error)
> +			retain_and_null_ptr(mnt); // consumed on success
>  		unlock_mount(&mp);
>  	}
> -	if (error < 0)
> -		mntput(mnt);
>  	return error;
>  }
>  
> -- 
> 2.47.2
> 

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 37/52] do_set_group(): constify path arguments
  2025-08-25  4:43   ` [PATCH 37/52] do_set_group(): constify path arguments Al Viro
@ 2025-08-25 13:29     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:29 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:40AM +0100, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 39/52] collect_paths(): constify the return value
  2025-08-25  4:43   ` [PATCH 39/52] collect_paths(): constify the return value Al Viro
@ 2025-08-25 13:30     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:30 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:42AM +0100, Al Viro wrote:
> callers have no business modifying the paths they get
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 36/52] do_mount_setattr(): constify path argument
  2025-08-25  4:43   ` [PATCH 36/52] do_mount_setattr(): constify path argument Al Viro
@ 2025-08-25 13:30     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:30 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:39AM +0100, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 42/52] do_new_mount{,_fc}(): constify struct path argument
  2025-08-25  4:43   ` [PATCH 42/52] do_new_mount{,_fc}(): " Al Viro
@ 2025-08-25 13:30     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:30 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:45AM +0100, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 40/52] do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s)
  2025-08-25  4:43   ` [PATCH 40/52] do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s) Al Viro
@ 2025-08-25 13:30     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:30 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:43AM +0100, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 38/52] drop_collected_paths(): constify arguments
  2025-08-25  4:43   ` [PATCH 38/52] drop_collected_paths(): constify arguments Al Viro
@ 2025-08-25 13:31     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:31 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:41AM +0100, Al Viro wrote:
> ... and use that to constify the pointers in callers
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 43/52] do_{loopback,change_type,remount,reconfigure_mnt}(): constify struct path argument
  2025-08-25  4:43   ` [PATCH 43/52] do_{loopback,change_type,remount,reconfigure_mnt}(): " Al Viro
@ 2025-08-25 13:31     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:31 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:46AM +0100, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 44/52] path_mount(): constify struct path argument
  2025-08-25  4:43   ` [PATCH 44/52] path_mount(): " Al Viro
@ 2025-08-25 13:32     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:32 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:47AM +0100, Al Viro wrote:
> now it finally can be done.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 41/52] mnt_warn_timestamp_expiry(): constify struct path argument
  2025-08-25  4:43   ` [PATCH 41/52] mnt_warn_timestamp_expiry(): constify struct path argument Al Viro
@ 2025-08-25 13:32     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:32 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:44AM +0100, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 49/52] do_mount(): use __free(path_put)
  2025-08-25  4:43   ` [PATCH 49/52] do_mount(): " Al Viro
@ 2025-08-25 13:32     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:32 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:52AM +0100, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 47/52] constify can_move_mount_beneath() arguments
  2025-08-25  4:43   ` [PATCH 47/52] constify can_move_mount_beneath() arguments Al Viro
@ 2025-08-25 13:39     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:39 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:50AM +0100, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 48/52] do_move_mount_old(): use __free(path_put)
  2025-08-25  4:43   ` [PATCH 48/52] do_move_mount_old(): use __free(path_put) Al Viro
@ 2025-08-25 13:40     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:40 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:51AM +0100, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 46/52] path_umount(): constify struct path argument
  2025-08-25  4:43   ` [PATCH 46/52] path_umount(): " Al Viro
@ 2025-08-25 13:40     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:40 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:49AM +0100, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 45/52] may_copy_tree(), __do_loopback(): constify struct path argument
  2025-08-25  4:43   ` [PATCH 45/52] may_copy_tree(), __do_loopback(): " Al Viro
@ 2025-08-25 13:40     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:40 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:48AM +0100, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 51/52] ecryptfs: get rid of pointless mount references in ecryptfs dentries
  2025-08-25  4:43   ` [PATCH 51/52] ecryptfs: get rid of pointless mount references in ecryptfs dentries Al Viro
@ 2025-08-25 13:41     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:41 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:54AM +0100, Al Viro wrote:
> ->lower_path.mnt has the same value for all dentries on given ecryptfs
> instance and if somebody goes for mountpoint-crossing variant where that
> would not be true, we can deal with that when it happens (and _not_
> with duplicating these reference into each dentry).
> 
> As it is, we are better off just sticking a reference into ecryptfs-private
> part of superblock and keeping it pinned until ->kill_sb().

The overlayfs model.

> 
> That way we can stick a reference to underlying dentry right into ->d_fsdata
> of ecryptfs one, getting rid of indirection through struct ecryptfs_dentry_info,
> along with the entire struct ecryptfs_dentry_info machinery.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 52/52] fs/namespace.c: sanitize descriptions for {__,}lookup_mnt()
  2025-08-25  4:43   ` [PATCH 52/52] fs/namespace.c: sanitize descriptions for {__,}lookup_mnt() Al Viro
@ 2025-08-25 13:42     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:42 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:55AM +0100, Al Viro wrote:
> Comments regarding "shadow mounts" were stale - no such thing anymore.
> Document the locking requirements for __lookup_mnt().
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 33/52] new helper: topmost_overmount()
  2025-08-25  4:43   ` [PATCH 33/52] new helper: topmost_overmount() Al Viro
@ 2025-08-25 13:43     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:36AM +0100, Al Viro wrote:
> Returns the final (topmost) mount in the chain of overmounts
> starting at given mount.  Same locking rules as for any mount
> tree traversal - either the spinlock side of mount_lock, or
> rcu + sample the seqcount side of mount_lock before the call
> and recheck afterwards.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 35/52] constify check_mnt()
  2025-08-25  4:43   ` [PATCH 35/52] constify check_mnt() Al Viro
@ 2025-08-25 13:43     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:38AM +0100, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 31/52] pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry
  2025-08-25  4:43   ` [PATCH 31/52] pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry Al Viro
@ 2025-08-25 13:43     ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-25 13:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:43:34AM +0100, Al Viro wrote:
> That kills the last place where callers of lock_mount(path, &mp)
> used path->dentry.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 02/52] introduced guards for mount_lock
  2025-08-25 12:32     ` Christian Brauner
@ 2025-08-25 13:46       ` Al Viro
  2025-08-25 20:21         ` Al Viro
  0 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25 13:46 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 02:32:38PM +0200, Christian Brauner wrote:
> On Mon, Aug 25, 2025 at 05:43:05AM +0100, Al Viro wrote:
> > mount_writer: write_seqlock; that's an equivalent of {un,}lock_mount_hash()
> > mount_locked_reader: read_seqlock_excl; these tend to be open-coded.
> 
> Do we really need the "locked" midfix in there? Doesn't seem to buy any
> clarity. I'd drop it so the naming is nicely consistent.

It's a seqlock.  "Readers" is this context are lockless ones - sample/retry under
rcu_read_lock() kind.  The only difference between writer and locked reader is
that locked reader does not disrupt those sample/retry loops.

Note that for something that is never traversed locklessly (expiry lists,
lists of children, etc.) locked reader is fine for all accesses, including
modifications.

If you have better suggestions re terminology, I'd love to hear those, but
simply "writer"/"reader" is misleadingly similar to rw-semaphors/links/whatnot.

Basically, there are 3 kinds of contexts here:
	1) lockless, must be under RCU, fairly limited in which pointers they
can traverse, read-only access to structures in question.  Must sample
the seqcount side of mount_lock first, then verifying that it has not changed
after everything.

	2) hold the spinlock side of mount_lock, _without_ bumping the seqcount
one.  Can be used for reads and writes, as long as the stuff being modified
is not among the things that is traversed locklessly.  Do not disrupt the previous
class, have full exclusion with calles 2 and 3

	3) hold the spinlock side of mount_lock, and bump the seqcount one on
entry and leave.  Any reads and writes.  Full exclusion with classes 2 and 3,
invalidates the checks for class 1 (i.e. will push it into retries/fallbacks/
whatnot).

I'm used to "lockless reader" for 1, "writer" for 3. "locked reader" kinda
works for 2 - that's what it is wrt things that can be accessed by lockless
readers, but for the things that are *not* traversed without a lock it
can be actually used as a less disruptive form of 3.  Is used that way in
mount locking for some of the data structures.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 20/52] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount()
  2025-08-25 13:02     ` Christian Brauner
@ 2025-08-25 16:05       ` Al Viro
  0 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25 16:05 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 03:02:09PM +0200, Christian Brauner wrote:
> On Mon, Aug 25, 2025 at 05:43:23AM +0100, Al Viro wrote:
> > We want to mount beneath the given location.  For that operation to
> > make sense, location must be the root of some mount that has something
> > under it.  Currently we let it proceed if those requirements are not met,
> > with rather meaningless results, and have that bogosity caught further
> > down the road; let's fail early instead - do_lock_mount() doesn't make
> > sense unless those conditions hold, and checking them there makes
> > things simpler.
> > 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > ---
> 
> Well, do_lock_mount() was already convoluted enough that didn't want
> that in there as well. But I don't care,

It helps when it comes to cleaning it up - look at the condition it's in
after 34/52...

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 25/52] do_new_mount_rc(): use __free() to deal with dropping mnt on failure
  2025-08-25 13:29     ` Christian Brauner
@ 2025-08-25 16:09       ` Al Viro
  2025-08-26  8:27         ` Christian Brauner
  0 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25 16:09 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 03:29:33PM +0200, Christian Brauner wrote:
> > -	mnt = vfs_create_mount(fc);
> > +	struct vfsmount *mnt __free(mntput) = vfs_create_mount(fc);
> 
> Ugh, can we please not start declaring variables in the middle of a
> scope.

Seeing that it *is* the beginning of its scope, what do you suggest?
Declaring it above, initializing with NULL and reassigning here?
That's actually just as wrong, if not more so - any assignment added
to it at earlier point and you've got a silent leak, so verifying
correctness would be harder that way.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-25 12:43 ` Christian Brauner
@ 2025-08-25 16:11   ` Al Viro
  2025-08-25 17:43     ` Al Viro
  0 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25 16:11 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, Linus Torvalds, Jan Kara

On Mon, Aug 25, 2025 at 02:43:43PM +0200, Christian Brauner wrote:
> On Mon, Aug 25, 2025 at 05:40:46AM +0100, Al Viro wrote:
> > 	Most of this pile is basically an attempt to see how well do
> > cleanup.h-style mechanisms apply in mount handling.  That stuff lives in
> > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.mount
> > Rebased to -rc3 (used to be a bit past -rc2, branched at mount fixes merge)
> > Individual patches in followups.
> > 
> > 	Please, help with review and testing.  It seems to survive the
> > local beating and code generation seems to be OK, but more testing
> > would be a good thing and I would really like to see comments on that
> > stuff.
> 
> Btw, I just realized that basically none of your commits have any lore
> links in them. That kinda sucks because I very very often just look at a
> commit and then use the link to jump to the mailing list discussion for
> more context about a change and how it came about.
> 
> So pretty please can you start adding lore links to your commits when
> applying if it's not fucking up your workflow too much?

Links to what, at the first posting?  Confused...

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 09/52] put_mnt_ns(): use guards
  2025-08-25 12:40     ` Christian Brauner
@ 2025-08-25 16:21       ` Al Viro
  0 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25 16:21 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 02:40:53PM +0200, Christian Brauner wrote:

> Another thing, did I miss
> 
> commit aab771f34e63ef89e195b63d121abcb55eebfde6
> Author:     Al Viro <viro@zeniv.linux.org.uk>
> AuthorDate: Wed Jun 18 18:23:41 2025 -0400
> Commit:     Al Viro <viro@zeniv.linux.org.uk>
> CommitDate: Sun Jun 29 19:03:46 2025 -0400
> 
>     take freeing of emptied mnt_namespace to namespace_unlock()
> 
> on the list somehow? I just saw that "emptied_ns" thing for the first
> time and was very confused where that came from. I don't see any lore
> link attached to the commit message.

https://lore.kernel.org/all/20250623045428.1271612-35-viro@zeniv.linux.org.uk/

and

https://lore.kernel.org/all/20250630025255.1387419-45-viro@zeniv.linux.org.uk/

in the next iteration of the same patchset, both Cc'd to you.

As for the reasons, there are nasty hidden constraints caused by mount notifications;
even though all mounts are out of that namespace, we can't free it until the calls
of mnt_notify(), which come from notify_mnt_list(), from namespace_unlock().

Better handle it that way than have a recurring headache; besides, it helps with
cleaning post-unlock_mount() stuff.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 13/52] has_locked_children(): use guards
  2025-08-25 11:54     ` Linus Torvalds
@ 2025-08-25 17:33       ` Al Viro
  0 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25 17:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, brauner, jack

On Mon, Aug 25, 2025 at 07:54:45AM -0400, Linus Torvalds wrote:
> [ diff edited to be just the end result ]
> 
> On Mon, 25 Aug 2025 at 00:44, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> >  bool has_locked_children(struct mount *mnt, struct dentry *dentry)
> >  {
> > +       scoped_guard(mount_locked_reader)
> > +               return __has_locked_children(mnt, dentry);
> >  }
> 
> So the use of scoped_guard() looks a bit odd to me. Why create a new
> scope for when the existing scope is identical? It would seem to be
> more straightforward to just do
> 
>         guard(mount_locked_reader);
>         return __has_locked_children(mnt, dentry);
> 
> instead. Was there some code generation issue or other thing that made
> you go the 'scoped' way?
> 
> There was at least one other patch that did the same pattern (but I
> haven't gone through the whole series, maybe there are explanations
> later).

TBH, the main reason is that my mental model for that is
	with_lock: lock -> m X -> m X
pardon the pseudo-Haskell.  IOW, "wrap that sequence of operations into
this lock".

Oh, well - I can live with open-ended scope in a function that small and
that unlikely to grow more stuff in it...

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-25 16:11   ` Al Viro
@ 2025-08-25 17:43     ` Al Viro
  2025-08-25 20:18       ` Theodore Ts'o
  2025-08-26  8:56       ` Christian Brauner
  0 siblings, 2 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25 17:43 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, Linus Torvalds, Jan Kara

On Mon, Aug 25, 2025 at 05:11:14PM +0100, Al Viro wrote:
> On Mon, Aug 25, 2025 at 02:43:43PM +0200, Christian Brauner wrote:
> > On Mon, Aug 25, 2025 at 05:40:46AM +0100, Al Viro wrote:
> > > 	Most of this pile is basically an attempt to see how well do
> > > cleanup.h-style mechanisms apply in mount handling.  That stuff lives in
> > > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.mount
> > > Rebased to -rc3 (used to be a bit past -rc2, branched at mount fixes merge)
> > > Individual patches in followups.
> > > 
> > > 	Please, help with review and testing.  It seems to survive the
> > > local beating and code generation seems to be OK, but more testing
> > > would be a good thing and I would really like to see comments on that
> > > stuff.
> > 
> > Btw, I just realized that basically none of your commits have any lore
> > links in them. That kinda sucks because I very very often just look at a
> > commit and then use the link to jump to the mailing list discussion for
> > more context about a change and how it came about.
> > 
> > So pretty please can you start adding lore links to your commits when
> > applying if it's not fucking up your workflow too much?
> 
> Links to what, at the first posting?  Confused...

I mean, this _is_ what I hope would be a discussion of that stuff -
that's what request for comments stands for, after all.  How is that
supposed to work?  Going back through the queue and slapping lore links
at the same time as the reviewed-by etc. are applied?  I honestly have
no idea what practice do you have in mind - ~95% of the time I'm sitting
in nvi - it serves as IDE for me; mutt takes a large part of the rest.
Browser is something that gets used occasionally when I have to...

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-25 17:43     ` Al Viro
@ 2025-08-25 20:18       ` Theodore Ts'o
  2025-08-26  8:56       ` Christian Brauner
  1 sibling, 0 replies; 320+ messages in thread
From: Theodore Ts'o @ 2025-08-25 20:18 UTC (permalink / raw)
  To: Al Viro; +Cc: Christian Brauner, linux-fsdevel, Linus Torvalds, Jan Kara

On Mon, Aug 25, 2025 at 06:43:12PM +0100, Al Viro wrote:
> I mean, this _is_ what I hope would be a discussion of that stuff -
> that's what request for comments stands for, after all.  How is that
> supposed to work?  Going back through the queue and slapping lore links
> at the same time as the reviewed-by etc. are applied?

Lore links are useful when a maintainer is applying someone else's
patches into their git tree.  I think that's what Christian was
thinking about.  In this case, however, where the maintainer is the
one autoring/sending the patches the patches, there is the
chicken-and-egg prblem that you've described, and so I don't
understand why Christian has made that request.

Usually I just construct the lore URL from the Message ID from the
patch series, but what I've seen other olks do for very large patch
sets is that they'll also publish the patches on git, for example from
Darrick's recent fuse/iomap patches, he included a link in the
patchset cover letter to:

https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=fuse-iomap-cache

							- Ted

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 02/52] introduced guards for mount_lock
  2025-08-25 13:46       ` Al Viro
@ 2025-08-25 20:21         ` Al Viro
  2025-08-25 23:44           ` Al Viro
  2025-08-26 15:17           ` Askar Safin
  0 siblings, 2 replies; 320+ messages in thread
From: Al Viro @ 2025-08-25 20:21 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 02:46:04PM +0100, Al Viro wrote:

> Basically, there are 3 kinds of contexts here:
> 	1) lockless, must be under RCU, fairly limited in which pointers they
> can traverse, read-only access to structures in question.  Must sample
> the seqcount side of mount_lock first, then verifying that it has not changed
> after everything.
> 
> 	2) hold the spinlock side of mount_lock, _without_ bumping the seqcount
> one.  Can be used for reads and writes, as long as the stuff being modified
> is not among the things that is traversed locklessly.  Do not disrupt the previous
> class, have full exclusion with calles 2 and 3
> 
> 	3) hold the spinlock side of mount_lock, and bump the seqcount one on
> entry and leave.  Any reads and writes.  Full exclusion with classes 2 and 3,
> invalidates the checks for class 1 (i.e. will push it into retries/fallbacks/
> whatnot).

FWIW, partial dump from what I hope to push out as docs:

	* all modifications of mount hash chains must be mount_writer.
	* only one function is allowed to traverse hash chains - __lookup_mnt().
Important part here is reachability - hash is a shared data structure, but
a struct mount instance can be reached that way only if it has parent equal
to the argument you've been able to pass to __lookup_mnt().
	* callers of __lookup_mnt() must either be at least mount_locked_reader
OR hold rcu_read_lock through the entire thing, sample the seqcount side of
mount_lock before the call, validate it afterwards and discard the attempt
entirely if validation fails.  Note that __legitimize_mnt() contains validation.
	* being hashed contributes 1 to refcount.

	* (sub)tree topology (encoded in ->mnt_parent, ->mnt_mounts/->mnt_child,
->mnt_mp, ->mnt_mountpoint and ->overmount) is stabilize by either mount_locked_reader
OR by namespace_shared + positive refcount for root of subtree.
	namespace_shared by itself is *NOT* enough.  When the last reference to
mount past the umount_tree() (i.e. already with NULL ->mnt_ns) goes away, anything
subtree stuck to it will be detached from it and have its root unhashed and dropped.
In other words, such tree (e.g. result of umount -l) decays from root to leaves -
once all references to root are gone, it's cut off and all pieces are left
to decay.  That is done with mount_writer (has to be - there are mount hash changes
and for those mount_writer is a hard requirement) and only after the final reference
to root has been dropped.
	All other topology changes happen with namespace_excl and, at least,
mount_locked_reader.  Normally - with mount_writer; the only exception is that
setting parent for a newly allocated subtree is fine with mount_locked_reader;
we are not hashing it yet (that's done only in commit_tree()), so there's no
need to disrupt the lockless readers; note that RCU pathwalk *is* such, so
blind use of mount_writer has an effect on performance.
	->mnt_mounts/->mnt_child is never traversed unless the tree is stabilized
by either lock (note that list modifications there are not with ..._rcu() primitives).
->overmount, ->mnt_parent and ->mnt_mountpoint can be; those need sample/validate
on the seqcount side; it *would* require mount_write from those who modify them,
except that for the ones that had never been reachable yet we don't need to bother.
In practice, ->overmount is changed along with the mount hash, so we need mount_writer
anyway; ->mnt_parent/->mnt_mountpoint/->mnt_mp need it only for reachable mounts.
[[
	FWIW, I'm considering the possibility of having copy_tree() delay
hashing all nodes in the copy and having them hashed all at once; fewer disruptions
for lockless readers that way.  All nodes in the copy are reachable only for the
caller; we do need mount_locked_reader for attaching a new node to copy (it has
to be inserted into the per-mountpoint lists of mounts), but we don't need to
bump the seqcount every time - and we can't hold a spinlock over allocations.
It's not even that hard; all we'd need is a bit of a change in commit_tree()
and in a couple of places where we create a namespace with more than one node -
we have the loops in those places already where we insert the mounts into
per-namespace rbtrees; same loops could handle hashing them.
]]

	* propagation graph (->mnt_share, ->mnt_slave/->mnt_slave_list,
->mnt_master, ->mnt_group_id, IS_MNT_SHARED()) is modified only under
namespace_excl; all accesses are under at least namespace_shared.
Only mounts that belong to a namespace may be reached via those;
umount_tree() removed all victims from the graph before it returns
and it's impossible to include something that isn't a part of some
namespace into the graph afterwards.

	* ->mnt_expire is accessed (both traversals and modifications)
under mount_locked_reader.  No lockless traversals there.

	* per-namespace rbtree (->mnt_node linkage) is modified only
under namespace_excl and all traversals are at least namespace_shared.
Mount leaving a namespace is removed from that before the end of
namespace_excl scope.

	* ->mnt_root and ->mnt_sb are assign-once; never changed.  So's
->mnt_devname, ->mnt_id and ->mnt_id_unique.

	* per-mountpoint mount lists (->mnt_mp_list) are mount_locked_reader
for all accesses (modification and traversal along).

	* ->prev_ns is a fucking mess.

	* ->mnt_umount has only transient uses; umount_tree() uses it
to link the victims to be dropped at namespace_unlock(), final mntput
links the stuck children into a list stashed into ->mnt_stuch_children,
also for eventual dropping (by cleanup_mnt()).  mount_writer for gathering
them into those, nothing for "dissolve and drop everything on the list" -
in both cases the lists are visible only to a single thread by that point.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 02/52] introduced guards for mount_lock
  2025-08-25 20:21         ` Al Viro
@ 2025-08-25 23:44           ` Al Viro
  2025-08-26  1:44             ` Al Viro
  2025-08-26 15:17           ` Askar Safin
  1 sibling, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-25 23:44 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 09:21:41PM +0100, Al Viro wrote:

> 	FWIW, I'm considering the possibility of having copy_tree() delay
> hashing all nodes in the copy and having them hashed all at once; fewer disruptions
> for lockless readers that way.  All nodes in the copy are reachable only for the
> caller; we do need mount_locked_reader for attaching a new node to copy (it has
> to be inserted into the per-mountpoint lists of mounts), but we don't need to
> bump the seqcount every time - and we can't hold a spinlock over allocations.
> It's not even that hard; all we'd need is a bit of a change in commit_tree()
> and in a couple of places where we create a namespace with more than one node -
> we have the loops in those places already where we insert the mounts into
> per-namespace rbtrees; same loops could handle hashing them.

The main issue I'm having with that is that currently "in list of children" implies
"hashed"; equivalent, even, except for a transient state seen only in mount_writer.
OTOH, having that not true for unreachable mounts...  I'm trying to find anything
that might care, but I don't see any candidates.

It would be nice to have regardless of doing fewer mount_lock seqcount bumps -
better isolation from shared data structures until we glue them in place would
make for simpler correctness proofs...

Anyway,
	copy_tree() call chains:
1.  copy_tree() <- propagate_mnt() <- attach_recursive_mnt(), with the call
chain prior to that point being one the
		<- graft_tree() <- do_loopback()
		<- graft_tree() <- do_add_mount() <- do_new_mount_fc()
		<- graft_tree() <- do_add_mount() <- finish_automount()
		<- do_move_mount().
All of those start inside a lock_mount scope.
Result gets passed (prior to return from attach_recursive_mnt(), within
an mnt_writer scope there) either to commit_tree() or to umount_tree(),
without having been visible to others prior to that.
	That's creation of secondary copies from mount propagation, for
various pathways to mounting stuff.

2.  copy_tree() <- __do_loopback() <- do_loopback().  Inside a lock_mount scope.
Result gets passed into graft_tree() -> attach_recursive_mnt().  In the latter
either it gets passed to commit_tree() (within mount_writer scope, without
having been visible to others prior to that), in which case success is reported,
or it is left alone and error gets reported; in that case back in do_loopback()
it gets passed to umount_tree(), again in mount_writer scope and without having
been visible to others prior to that.
	That's MS_BIND|MS_REC mount(2).

3.  copy_tree() <- __do_loopback() <- open_detached_copy().  In namespace_excl
scope.  Result is fed through a loop that inserts those mounts into rbtree
of new namespace (in mount_writer scope) and its root is stored as ->root
of that new namespace.  Once out of namespace_excl scope, the tree becomes
visible (and an extra reference is attached to the file we are opening).
	That's open_tree(2)/open_tree_attr(2) with OPEN_TREE_CLONE.
	BTW, a bit of mystery there: insertions into rbtree don't need to be in
mount_writer - we do have places where it's done without that, all readers are
in namespace_shared scopes *and* the namespace, along with its rbtree, is not
visible to anyone yet to start with.  If we delay hashing until there it will
need mount_writer, though.

4.  copy_tree() <- copy_mnt_ns().  In namespace_excl scope.  Somewhat similar
to the previous, but the namespace is not an anonymous one and we have a couple
of extra passes - one might do lock_mnt_tree() (under mount_writer, almost
certainly excessive - mount_locked_reader would do just fine) and another
(combined with rbtree insertions) finds the counterparts of root and pwd of
the caller and flips over to those.  Old ones get dropped after we leave
the scope.

Looks like we should be able to unify quite a bit of logics in populating
a new namespace and yes, delaying hash insertions past copy_tree() looks
plausible...

	Incidentally, destruction of new namespace on copy_tree() failure
is another mystery: here we do
                ns_free_inum(&new_ns->ns);
		dec_mnt_namespaces(new_ns->ucounts);
		mnt_ns_release(new_ns);
and in open_detached_copy() it's
	free_mnt_ns(ns);

They are similar - free_mnt_ns() is
	if (!is_anon_ns(ns))
		ns_free_inum(&ns->ns);
	dec_mnt_namespaces(ns->ucounts);
	mnt_ns_tree_remove(ns);
and mnt_ns_tree_remove() is a bunch of !is_anon_ns() code, followed by
an rcu-delayed mnt_ns_release().  So in case of open_detached_copy(),
where the namespace is anonymous, it boils down to an RCU-delayed
call of mnt_ns_release()...

AFAICS the only possible reasons not to use free_mnt_ns() here are
	1) avoiding an RCU-delayed call and
	2) conditional removal of ns from mnt_ns_tree.

As for the second, couldn't we simply use !list_empty(&ns->mnt_ns_list)
as a condition?  And avoiding an RCU delay... nice, in principle, but
the case when that would've saved us anything is CLONE_NEWNS clone(2) or
unshare(2) failing due to severe OOM.  Do we give a damn about one extra
call_rcu() for each of such failures?

mnt_ns_tree handling is your code; do you see any problems with

static void mnt_ns_tree_remove(struct mnt_namespace *ns)
{
	/* remove from global mount namespace list */
	if (!list_empty(&ns->mnt_ns_list)) {
		mnt_ns_tree_write_lock();
		rb_erase(&ns->mnt_ns_tree_node, &mnt_ns_tree);
		list_bidir_del_rcu(&ns->mnt_ns_list);
		mnt_ns_tree_write_unlock();
	}

	call_rcu(&ns->mnt_ns_rcu, mnt_ns_release_rcu);
}
and
	mnt = __do_loopback(path, recursive);
	if (IS_ERR(mnt)) {
		emptied_ns = ns;
		namespace_unlock();
		return ERR_CAST(mnt);
	}
in open_detached_copy() and
	new = copy_tree(old, old->mnt.mnt_root, copy_flags);
	if (IS_ERR(new)) {
		emptied_ns = new_ns;
		namespace_unlock();
		return ERR_CAST(new);
	}
in copy_mnt_ns()?

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 02/52] introduced guards for mount_lock
  2025-08-25 23:44           ` Al Viro
@ 2025-08-26  1:44             ` Al Viro
  0 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-26  1:44 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, jack, torvalds

On Tue, Aug 26, 2025 at 12:44:13AM +0100, Al Viro wrote:

> As for the second, couldn't we simply use !list_empty(&ns->mnt_ns_list)
> as a condition?  And avoiding an RCU delay... nice, in principle, but
> the case when that would've saved us anything is CLONE_NEWNS clone(2) or
> unshare(2) failing due to severe OOM.  Do we give a damn about one extra
> call_rcu() for each of such failures?
> 
> mnt_ns_tree handling is your code; do you see any problems with

... this (on top of the posted series, needs to be carved into several parts -
dropping pointless lock_mount_hash() in open_detached_copy(), making
mnt_ns_tree_remove() and thus free_mnt_ns() safe to use on ns not in mnt_ns_tree
yet, then dealing with open_detached_copy() and copy_mnt_ns() separately):

diff --git a/fs/namespace.c b/fs/namespace.c
index 63b74d7384fd..b77469789f82 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -195,7 +195,7 @@ static void mnt_ns_release_rcu(struct rcu_head *rcu)
 static void mnt_ns_tree_remove(struct mnt_namespace *ns)
 {
 	/* remove from global mount namespace list */
-	if (!is_anon_ns(ns)) {
+	if (!list_empty(&ns->mnt_ns_list)) {
 		mnt_ns_tree_write_lock();
 		rb_erase(&ns->mnt_ns_tree_node, &mnt_ns_tree);
 		list_bidir_del_rcu(&ns->mnt_ns_list);
@@ -3053,18 +3053,17 @@ static int do_loopback(const struct path *path, const char *old_name,
 	return err;
 }
 
-static struct file *open_detached_copy(struct path *path, bool recursive)
+static struct mnt_namespace *get_detached_copy(const struct path *path, bool recursive)
 {
 	struct mnt_namespace *ns, *mnt_ns = current->nsproxy->mnt_ns, *src_mnt_ns;
 	struct user_namespace *user_ns = mnt_ns->user_ns;
 	struct mount *mnt, *p;
-	struct file *file;
 
 	ns = alloc_mnt_ns(user_ns, true);
 	if (IS_ERR(ns))
-		return ERR_CAST(ns);
+		return ns;
 
-	namespace_lock();
+	guard(namespace_excl)();
 
 	/*
 	 * Record the sequence number of the source mount namespace.
@@ -3081,23 +3080,28 @@ static struct file *open_detached_copy(struct path *path, bool recursive)
 
 	mnt = __do_loopback(path, recursive);
 	if (IS_ERR(mnt)) {
-		namespace_unlock();
-		free_mnt_ns(ns);
+		emptied_ns = ns;
 		return ERR_CAST(mnt);
 	}
 
-	lock_mount_hash();
 	for (p = mnt; p; p = next_mnt(p, mnt)) {
 		mnt_add_to_ns(ns, p);
 		ns->nr_mounts++;
 	}
 	ns->root = mnt;
-	mntget(&mnt->mnt);
-	unlock_mount_hash();
-	namespace_unlock();
+	return ns;
+}
+
+static struct file *open_detached_copy(struct path *path, bool recursive)
+{
+	struct mnt_namespace *ns = get_detached_copy(path, recursive);
+	struct file *file;
+
+	if (IS_ERR(ns))
+		return ERR_CAST(ns);
 
 	mntput(path->mnt);
-	path->mnt = &mnt->mnt;
+	path->mnt = mntget(&ns->root->mnt);
 	file = dentry_open(path, O_PATH, current_cred());
 	if (IS_ERR(file))
 		dissolve_on_fput(path->mnt);
@@ -4165,7 +4169,8 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 		struct user_namespace *user_ns, struct fs_struct *new_fs)
 {
 	struct mnt_namespace *new_ns;
-	struct vfsmount *rootmnt = NULL, *pwdmnt = NULL;
+	struct vfsmount *rootmnt __free(mntput)= NULL;
+	struct vfsmount *pwdmnt __free(mntput) = NULL;
 	struct mount *p, *q;
 	struct mount *old;
 	struct mount *new;
@@ -4184,23 +4189,20 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 	if (IS_ERR(new_ns))
 		return new_ns;
 
-	namespace_lock();
+	guard(namespace_excl)();
 	/* First pass: copy the tree topology */
 	copy_flags = CL_COPY_UNBINDABLE | CL_EXPIRE;
 	if (user_ns != ns->user_ns)
 		copy_flags |= CL_SLAVE;
 	new = copy_tree(old, old->mnt.mnt_root, copy_flags);
 	if (IS_ERR(new)) {
-		namespace_unlock();
-		ns_free_inum(&new_ns->ns);
-		dec_mnt_namespaces(new_ns->ucounts);
-		mnt_ns_release(new_ns);
+		emptied_ns = new_ns;
 		return ERR_CAST(new);
 	}
+
 	if (user_ns != ns->user_ns) {
-		lock_mount_hash();
-		lock_mnt_tree(new);
-		unlock_mount_hash();
+		scoped_guard(mount_writer)
+			lock_mnt_tree(new);
 	}
 	new_ns->root = new;
 
@@ -4232,12 +4234,6 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 		while (p->mnt.mnt_root != q->mnt.mnt_root)
 			p = next_mnt(skip_mnt_tree(p), old);
 	}
-	namespace_unlock();
-
-	if (rootmnt)
-		mntput(rootmnt);
-	if (pwdmnt)
-		mntput(pwdmnt);
 
 	mnt_ns_tree_add(new_ns);
 	return new_ns;

^ permalink raw reply related	[flat|nested] 320+ messages in thread

* Re: [PATCH 25/52] do_new_mount_rc(): use __free() to deal with dropping mnt on failure
  2025-08-25 16:09       ` Al Viro
@ 2025-08-26  8:27         ` Christian Brauner
  2025-08-26 17:00           ` Al Viro
  0 siblings, 1 reply; 320+ messages in thread
From: Christian Brauner @ 2025-08-26  8:27 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Mon, Aug 25, 2025 at 05:09:39PM +0100, Al Viro wrote:
> On Mon, Aug 25, 2025 at 03:29:33PM +0200, Christian Brauner wrote:
> > > -	mnt = vfs_create_mount(fc);
> > > +	struct vfsmount *mnt __free(mntput) = vfs_create_mount(fc);
> > 
> > Ugh, can we please not start declaring variables in the middle of a
> > scope.
> 
> Seeing that it *is* the beginning of its scope, what do you suggest?

What? Did I miss earlier or later changes because:

static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
			   unsigned int mnt_flags)
{
	struct vfsmount *mnt;
	struct pinned_mountpoint mp = {};
	struct super_block *sb = fc->root->d_sb;
	int error;

	error = security_sb_kern_mount(sb);
	if (!error && mount_too_revealing(sb, &mnt_flags))
		error = -EPERM;

	if (unlikely(error)) {
		fc_drop_locked(fc);
		return error;
	}

	up_write(&sb->s_umount);

	mnt = vfs_create_mount(fc);
	if (IS_ERR(mnt))
		return PTR_ERR(mnt);

How does up_write() create a new scope?

	mnt_warn_timestamp_expiry(mountpoint, mnt);

	error = lock_mount(mountpoint, &mp);
	if (!error) {
		error = do_add_mount(real_mount(mnt), mp.mp,
				     mountpoint, mnt_flags);
		unlock_mount(&mp);
	}
	if (error < 0)
		mntput(mnt);
	return error;
}

> Declaring it above, initializing with NULL and reassigning here?
> That's actually just as wrong, if not more so - any assignment added

I disagree. I do very much prefer having cleanups at the top of the
function or e.g.,:

if (foo) {
	struct vfsmount *mnt __free(mntput) = vfs_create_mount(fc);
}

Because it is really easy to figure out visually. But just doing it
somewhere in the middle is just confusing.

static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
			   unsigned int mnt_flags)
{
	struct pinned_mountpoint mp = {};
	struct super_block *sb = fc->root->d_sb;
	int error;

	error = security_sb_kern_mount(sb);
	if (!error && mount_too_revealing(sb, &mnt_flags))
		error = -eperm;

	if (unlikely(error)) {
		fc_drop_locked(fc);
		return error;
	}

	up_write(&sb->s_umount);

	struct vfsmount *mnt __free(mntput) = vfs_create_mount(fc);
	if (is_err(mnt))
		return ptr_err(mnt);

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-25 17:43     ` Al Viro
  2025-08-25 20:18       ` Theodore Ts'o
@ 2025-08-26  8:56       ` Christian Brauner
  2025-08-27 17:19         ` Linus Torvalds
  1 sibling, 1 reply; 320+ messages in thread
From: Christian Brauner @ 2025-08-26  8:56 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds, Jan Kara

On Mon, Aug 25, 2025 at 06:43:12PM +0100, Al Viro wrote:
> On Mon, Aug 25, 2025 at 05:11:14PM +0100, Al Viro wrote:
> > On Mon, Aug 25, 2025 at 02:43:43PM +0200, Christian Brauner wrote:
> > > On Mon, Aug 25, 2025 at 05:40:46AM +0100, Al Viro wrote:
> > > > 	Most of this pile is basically an attempt to see how well do
> > > > cleanup.h-style mechanisms apply in mount handling.  That stuff lives in
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.mount
> > > > Rebased to -rc3 (used to be a bit past -rc2, branched at mount fixes merge)
> > > > Individual patches in followups.
> > > > 
> > > > 	Please, help with review and testing.  It seems to survive the
> > > > local beating and code generation seems to be OK, but more testing
> > > > would be a good thing and I would really like to see comments on that
> > > > stuff.
> > > 
> > > Btw, I just realized that basically none of your commits have any lore
> > > links in them. That kinda sucks because I very very often just look at a
> > > commit and then use the link to jump to the mailing list discussion for
> > > more context about a change and how it came about.
> > > 
> > > So pretty please can you start adding lore links to your commits when
> > > applying if it's not fucking up your workflow too much?
> > 
> > Links to what, at the first posting?  Confused...
> 
> I mean, this _is_ what I hope would be a discussion of that stuff -
> that's what request for comments stands for, after all.  How is that
> supposed to work?  Going back through the queue and slapping lore links
> at the same time as the reviewed-by etc. are applied?  I honestly have
> no idea what practice do you have in mind - ~95% of the time I'm sitting
> in nvi - it serves as IDE for me; mutt takes a large part of the rest.
> Browser is something that gets used occasionally when I have to...

You misunderstand.
Once you apply your series to the tree that you intend to merge simply
add the lore links to the patches of the last version. I don't give a
single damn whether someone _sends_ patches with lore links. That is not
what this is about. I care that I can git log at mainline and figure out
where that patch was discussed, pull down the discussion via b4 or other
tooling, without having to search lore.

IOW, what I asked you about is once the patches end up in mainline they
please have links to the discussion where they came from. I do it for
all patches no matter if I pick them up from someone else or if I'm
applying my own:

commit c237aa9884f238e1480897463ca034877ca7530b
Author:     Christian Brauner <brauner@kernel.org>

    kernfs: don't fail listing extended attributes

<snip>

    Link: https://lore.kernel.org/20250819-ahndung-abgaben-524a535f8101@brauner

^^^^^^^^^^^^^^^^^
    Signed-off-by: Christian Brauner <brauner@kernel.org>

I'm not doing that for my own personal wellness cure but for every other
poor bastard (granted, including me because one year later it's all
swapped out) who looks at commits in the git tree and wants to either
jump to a link in the browser or wants to use tooling to just pull the
whole discussion from the list.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 34/52] do_lock_mount(): don't modify path.
  2025-08-25  4:43   ` [PATCH 34/52] do_lock_mount(): don't modify path Al Viro
@ 2025-08-26 14:14     ` Askar Safin
  0 siblings, 0 replies; 320+ messages in thread
From: Askar Safin @ 2025-08-26 14:14 UTC (permalink / raw)
  To: viro; +Cc: brauner, jack, linux-fsdevel, torvalds

> +		m = __lookup_mnt(path->mnt, *dentry = path->dentry);

I don't like this.

Someone may think you meant "*dentry == path->dentry" here.

Please, write this:

  *dentry = path->dentry;
  m = __lookup_mnt(path->mnt, *dentry);

-- 
Askar Safin

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 02/52] introduced guards for mount_lock
  2025-08-25 20:21         ` Al Viro
  2025-08-25 23:44           ` Al Viro
@ 2025-08-26 15:17           ` Askar Safin
  2025-08-26 15:45             ` Al Viro
  1 sibling, 1 reply; 320+ messages in thread
From: Askar Safin @ 2025-08-26 15:17 UTC (permalink / raw)
  To: viro; +Cc: brauner, jack, linux-fsdevel, torvalds

Al Viro <viro@zeniv.linux.org.uk>:
> When the last reference to
> mount past the umount_tree() (i.e. already with NULL ->mnt_ns) goes away, anything
> subtree stuck to it will be detached from it and have its root unhashed and dropped.
> In other words, such tree (e.g. result of umount -l) decays from root to leaves -
> once all references to root are gone, it's cut off and all pieces are left
> to decay.  That is done with mount_writer (has to be - there are mount hash changes
> and for those mount_writer is a hard requirement) and only after the final reference
> to root has been dropped.

I'm unable to understand this.

As well as I understand your text, when you unmount some directory /a using "umount -l /a", then /a and
all its children will stay as long as there are references to /a . This contradicts to reality.

Consider this:

# mount -t tmpfs tmpfs /a
# mkdir /a/b
# mount -t tmpfs tmpfs /a/b
# mkdir /a/b/c
# cd /a
# umount -l /a

According to your text, both /a and /a/b will stay, because we have reference to /a (via our cwd).

But in reality /a/b disappears immidiately (i. e. "ls b" shows nothing, as opposed to "c").

This happens even if I test with your patches applied.

So, your explanation seems to be wrong.

-- 
Askar Safin

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 02/52] introduced guards for mount_lock
  2025-08-26 15:17           ` Askar Safin
@ 2025-08-26 15:45             ` Al Viro
  0 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-26 15:45 UTC (permalink / raw)
  To: Askar Safin; +Cc: brauner, jack, linux-fsdevel, torvalds

On Tue, Aug 26, 2025 at 06:17:45PM +0300, Askar Safin wrote:
> Al Viro <viro@zeniv.linux.org.uk>:
> > When the last reference to
> > mount past the umount_tree() (i.e. already with NULL ->mnt_ns) goes away, anything
> > subtree stuck to it will be detached from it and have its root unhashed and dropped.
> > In other words, such tree (e.g. result of umount -l) decays from root to leaves -
> > once all references to root are gone, it's cut off and all pieces are left
> > to decay.  That is done with mount_writer (has to be - there are mount hash changes
> > and for those mount_writer is a hard requirement) and only after the final reference
> > to root has been dropped.
> 
> I'm unable to understand this.
> 
> As well as I understand your text, when you unmount some directory /a using "umount -l /a", then /a and
> all its children will stay as long as there are references to /a . This contradicts to reality.
> 
> Consider this:
> 
> # mount -t tmpfs tmpfs /a
> # mkdir /a/b
> # mount -t tmpfs tmpfs /a/b
> # mkdir /a/b/c
> # cd /a
> # umount -l /a
> 
> According to your text, both /a and /a/b will stay, because we have reference to /a (via our cwd).
> 
> But in reality /a/b disappears immidiately (i. e. "ls b" shows nothing, as opposed to "c").
> 
> This happens even if I test with your patches applied.
> 
> So, your explanation seems to be wrong.

Take a look at disconnect_mount().  For example, if mount is locked (== propagated across the userns
boundary), it will remain stuck to its parent.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH 25/52] do_new_mount_rc(): use __free() to deal with dropping mnt on failure
  2025-08-26  8:27         ` Christian Brauner
@ 2025-08-26 17:00           ` Al Viro
  2025-08-26 17:55             ` Al Viro
  0 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-26 17:00 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, jack, torvalds

On Tue, Aug 26, 2025 at 10:27:56AM +0200, Christian Brauner wrote:

> > Declaring it above, initializing with NULL and reassigning here?
> > That's actually just as wrong, if not more so - any assignment added
> 
> I disagree. I do very much prefer having cleanups at the top of the
> function or e.g.,:
> 
> if (foo) {
> 	struct vfsmount *mnt __free(mntput) = vfs_create_mount(fc);
> }
> 
> Because it is really easy to figure out visually. But just doing it
> somewhere in the middle is just confusing.

So basically you treat __free() simply as a syntax sugar for "call this
on exits from this block", rather than an approximation for "here's an
auto object we've created, this should be called to destroy it at the
end of its scope/lifetime"?

IMO it's a bad practice - it makes life much harder when you are tracing
callchains, etc.

FWIW, I wonder if the things would be cleaner if we did security_sb_kern_mount()
and mount_too_revealing() *after* unlocking the superblock and getting a vfsmount.
The latter definitely doesn't give a damn about superblock being locked and
AFAICS neither does the only in-tree instance of ->sb_kern_mount().
That way we have the real initialization reasonably close to __free() and
control flow is easier to follow...

Folks, how about something like the delta below (on top of the posted queue)?

diff --git a/fs/namespace.c b/fs/namespace.c
index 63b74d7384fd..191e7f776de5 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3689,24 +3689,22 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
 static int do_new_mount_fc(struct fs_context *fc, const struct path *mountpoint,
 			   unsigned int mnt_flags)
 {
+	struct vfsmount *mnt __free(mntput) = NULL;
 	struct super_block *sb = fc->root->d_sb;
 	int error;
 
-	error = security_sb_kern_mount(sb);
-	if (!error && mount_too_revealing(sb, &mnt_flags))
-		error = -EPERM;
-
-	if (unlikely(error)) {
-		fc_drop_locked(fc);
-		return error;
-	}
-
 	up_write(&sb->s_umount);
-
-	struct vfsmount *mnt __free(mntput) = vfs_create_mount(fc);
+	mnt = vfs_create_mount(fc);
 	if (IS_ERR(mnt))
 		return PTR_ERR(mnt);
 
+	error = security_sb_kern_mount(sb);
+	if (unlikely(error))
+		return error;
+
+	if (mount_too_revealing(sb, &mnt_flags))
+		return -EPERM;
+
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
 	LOCK_MOUNT(mp, mountpoint);

^ permalink raw reply related	[flat|nested] 320+ messages in thread

* Re: [PATCH 25/52] do_new_mount_rc(): use __free() to deal with dropping mnt on failure
  2025-08-26 17:00           ` Al Viro
@ 2025-08-26 17:55             ` Al Viro
  2025-08-26 18:21               ` [RFC][PATCH] switch do_new_mount_fc() to using fc_mount() Al Viro
  0 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-26 17:55 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel, jack, torvalds

On Tue, Aug 26, 2025 at 06:00:44PM +0100, Al Viro wrote:

> FWIW, I wonder if the things would be cleaner if we did security_sb_kern_mount()
> and mount_too_revealing() *after* unlocking the superblock and getting a vfsmount.
> The latter definitely doesn't give a damn about superblock being locked and
> AFAICS neither does the only in-tree instance of ->sb_kern_mount().
> That way we have the real initialization reasonably close to __free() and
> control flow is easier to follow...

Or, better yet, take vfs_get_tree() from do_new_mount() to do_new_mount_fc()
and collapse it with "unlock ->s_umount and call vfs_create_mount()" into
a call of fc_mount(), like the delta below (on top of posted queue, would get reordered
ealier in it and pick the bits of #25 along the way).

Does anyone have objections here?  The only real change is that security_sb_kern_mount()
gets called outside of ->s_umount exclusive scope; no in-tree instances care, but I'd
Cc that to LSM list...

diff --git a/fs/namespace.c b/fs/namespace.c
index 63b74d7384fd..6f062dc7f9bf 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3690,22 +3690,18 @@ static int do_new_mount_fc(struct fs_context *fc, const struct path *mountpoint,
 			   unsigned int mnt_flags)
 {
 	struct super_block *sb = fc->root->d_sb;
+	struct vfsmount *mnt __free(mntput) = fc_mount(fc);
 	int error;
 
-	error = security_sb_kern_mount(sb);
-	if (!error && mount_too_revealing(sb, &mnt_flags))
-		error = -EPERM;
+	if (IS_ERR(mnt))
+		return PTR_ERR(mnt);
 
-	if (unlikely(error)) {
-		fc_drop_locked(fc);
+	error = security_sb_kern_mount(sb);
+	if (unlikely(error))
 		return error;
-	}
 
-	up_write(&sb->s_umount);
-
-	struct vfsmount *mnt __free(mntput) = vfs_create_mount(fc);
-	if (IS_ERR(mnt))
-		return PTR_ERR(mnt);
+	if (mount_too_revealing(sb, &mnt_flags))
+		return -EPERM;
 
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
@@ -3767,8 +3763,6 @@ static int do_new_mount(const struct path *path, const char *fstype,
 		err = parse_monolithic_mount_data(fc, data);
 	if (!err && !mount_capable(fc))
 		err = -EPERM;
-	if (!err)
-		err = vfs_get_tree(fc);
 	if (!err)
 		err = do_new_mount_fc(fc, path, mnt_flags);
 

^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [RFC][PATCH] switch do_new_mount_fc() to using fc_mount()
  2025-08-26 17:55             ` Al Viro
@ 2025-08-26 18:21               ` Al Viro
  2025-08-27 15:38                 ` Paul Moore
  0 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-26 18:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, jack, Christian Brauner, linux-security-module,
	Paul Moore

[
This is on top of -rc3; if nobody objects, I'll insert that early in series
in viro/vfs.git#work.mount.  It has an impact for LSM folks - ->sb_kern_mount()
would be called without ->s_umount; nothing in-tree cares, but if you have
objections, yell now.
]

Prior to the call of do_new_mount_fc() the caller has just done successful
vfs_get_tree().  Then do_new_mount_fc() does several checks on resulting
superblock, and either does fc_drop_locked() and returns an error or
proceeds to unlock the superblock and call vfs_create_mount().
    
The thing is, there's no reason to delay that unlock + vfs_create_mount() -
the tests do not rely upon the state of ->s_umount and
        fc_drop_locked()
        put_fs_context()
is equivalent to
        unlock ->s_umount
        put_fs_context()

Doing vfs_create_mount() before the checks allows us to move vfs_get_tree()
from caller to do_new_mount_fc() and collapse it with vfs_create_mount()
into an fc_mount() call.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/namespace.c b/fs/namespace.c
index ae6d1312b184..9e1b7319532c 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3721,25 +3721,19 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
 static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 			   unsigned int mnt_flags)
 {
-	struct vfsmount *mnt;
 	struct pinned_mountpoint mp = {};
 	struct super_block *sb = fc->root->d_sb;
+	struct vfsmount *mnt = fc_mount(fc);
 	int error;
 
+	if (IS_ERR(mnt))
+		return PTR_ERR(mnt);
+
 	error = security_sb_kern_mount(sb);
 	if (!error && mount_too_revealing(sb, &mnt_flags))
 		error = -EPERM;
-
-	if (unlikely(error)) {
-		fc_drop_locked(fc);
-		return error;
-	}
-
-	up_write(&sb->s_umount);
-
-	mnt = vfs_create_mount(fc);
-	if (IS_ERR(mnt))
-		return PTR_ERR(mnt);
+	if (unlikely(error))
+		goto out;
 
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
@@ -3747,10 +3741,12 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 	if (!error) {
 		error = do_add_mount(real_mount(mnt), mp.mp,
 				     mountpoint, mnt_flags);
+		if (!error)
+			mnt = NULL;	// consumed on success
 		unlock_mount(&mp);
 	}
-	if (error < 0)
-		mntput(mnt);
+out:
+	mntput(mnt);
 	return error;
 }
 
@@ -3804,8 +3800,6 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 		err = parse_monolithic_mount_data(fc, data);
 	if (!err && !mount_capable(fc))
 		err = -EPERM;
-	if (!err)
-		err = vfs_get_tree(fc);
 	if (!err)
 		err = do_new_mount_fc(fc, path, mnt_flags);
 

^ permalink raw reply related	[flat|nested] 320+ messages in thread

* Re: [RFC][PATCH] switch do_new_mount_fc() to using fc_mount()
  2025-08-26 18:21               ` [RFC][PATCH] switch do_new_mount_fc() to using fc_mount() Al Viro
@ 2025-08-27 15:38                 ` Paul Moore
  0 siblings, 0 replies; 320+ messages in thread
From: Paul Moore @ 2025-08-27 15:38 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, linux-fsdevel, jack, Christian Brauner,
	linux-security-module

On Tue, Aug 26, 2025 at 2:21 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> [
> This is on top of -rc3; if nobody objects, I'll insert that early in series
> in viro/vfs.git#work.mount.  It has an impact for LSM folks - ->sb_kern_mount()
> would be called without ->s_umount; nothing in-tree cares, but if you have
> objections, yell now.
> ]

Thanks for the heads-up, I'm not aware of anyone currently
posting/working-on patches that would be dependent on this.

> Prior to the call of do_new_mount_fc() the caller has just done successful
> vfs_get_tree().  Then do_new_mount_fc() does several checks on resulting
> superblock, and either does fc_drop_locked() and returns an error or
> proceeds to unlock the superblock and call vfs_create_mount().
>
> The thing is, there's no reason to delay that unlock + vfs_create_mount() -
> the tests do not rely upon the state of ->s_umount and
>         fc_drop_locked()
>         put_fs_context()
> is equivalent to
>         unlock ->s_umount
>         put_fs_context()
>
> Doing vfs_create_mount() before the checks allows us to move vfs_get_tree()
> from caller to do_new_mount_fc() and collapse it with vfs_create_mount()
> into an fc_mount() call.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-26  8:56       ` Christian Brauner
@ 2025-08-27 17:19         ` Linus Torvalds
  2025-08-27 17:49           ` Linus Torvalds
  0 siblings, 1 reply; 320+ messages in thread
From: Linus Torvalds @ 2025-08-27 17:19 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Al Viro, linux-fsdevel, Jan Kara

On Tue, 26 Aug 2025 at 01:56, Christian Brauner <brauner@kernel.org> wrote:
>
> I'm not doing that for my own personal wellness cure

Please only do this for things that were actually discussed.

Because for *my* wellness cure, I get really damn annoyed when I
wonder about some context of a commit, and follow a link to look at
the background, and all I see is that SAME DAMN PATCH that I already
looked at, and wondered about, then that link damn well wasted my
time.

It's annoying as hell.

And no, some "maybe people add acks or context later" is not a valid
reason to add a link. If there was no discussion about it at the time
it was committed, a link to some mailing list posting by definition
doesn't explain why the commit exists.

                    Linus

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-27 17:19         ` Linus Torvalds
@ 2025-08-27 17:49           ` Linus Torvalds
  2025-08-27 22:49             ` Konstantin Ryabitsev
  0 siblings, 1 reply; 320+ messages in thread
From: Linus Torvalds @ 2025-08-27 17:49 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Al Viro, linux-fsdevel, Jan Kara

On Wed, 27 Aug 2025 at 10:19, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And no, some "maybe people add acks or context later" is not a valid
> reason to add a link. If there was no discussion about it at the time
> it was committed, a link to some mailing list posting by definition
> doesn't explain why the commit exists.

Side note: relevant later discussion of patches obviously does happen,
but it's actually more likely to be independent of the mailing list
posting, and instead refer to the commit ID - and the shortlog of the
commit - than to the original posting.

Yes, some bots do obviously traverse the mailing list for patch series
to look at and test, but those bots are the ones that the developer /
maintainer should have reacted to *before* the commit goes upstream,
so finding them after-the-fact is simply not a high priority.

A much more common thing is that the "context added later" is a result
of people and bots reporting problems with a commit that has hit the
git trees, and they do *not* generally reply to the original posting.

So instead those much more relevant reports will typically make an
entirely new thread, mentioning the commit ID and the subject line.

Which is why I think it is so bass-ackwards to add a link to the
posting in the commit. That literally is useless garbage unless the
posting generated discussion. The link to the posting is not likely to
be the most relevant thing: it tends to be *much* more productive to
instead search lore for the commit ID and the subject line of the
commit.

That will obviously find the original posting of the patch too, but it
will *also* find those much more relevant and likely reports about
people/bots reporting issues with a commit in the git tree.

This is why I hate those pointless links so much. They are worthless
garbage. And the "but maybe somebody adds context later" is
intellectually dishonest, since that later context is likely *not*
found behind that link, but through other means entirely.

               Linus

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-27 17:49           ` Linus Torvalds
@ 2025-08-27 22:49             ` Konstantin Ryabitsev
  2025-08-27 23:40               ` Linus Torvalds
  0 siblings, 1 reply; 320+ messages in thread
From: Konstantin Ryabitsev @ 2025-08-27 22:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christian Brauner, Al Viro, linux-fsdevel, Jan Kara

On Wed, Aug 27, 2025 at 10:49:21AM -0700, Linus Torvalds wrote:
> Which is why I think it is so bass-ackwards to add a link to the
> posting in the commit. That literally is useless garbage unless the
> posting generated discussion. The link to the posting is not likely to
> be the most relevant thing: it tends to be *much* more productive to
> instead search lore for the commit ID and the subject line of the
> commit.

Main trouble is that we can't always reliably arrive at the source of the
patch in lore based on the commit. The subject line can be tricky to search
for if it uses quotes, brackets, or other characters that aren't reliably
tokenized. Furthermore, there can be situations where the results can be
ambiguous. For example, a [PATCH v7] could have been posted after the
maintainer had already accepted [PATCH v6], in which case the maintainer will
ask for a new bugfix series to be sent instead.

Similarly, we can't reliably go from the commit to the patch-id that we can
use to search the archives:

- the maintainer may have rebased the patch series, resulting in a different
  patch-id
- the original submission may have been generated with a different patch
  algorithm (histogram vs. myers is the usual culprit)
- the maintainer may have tweaked the patch for cosmetic reasons

All of the above may result in a different git-patch-id that no longer matches
the original submission.

I have recommended that Link: trailers indicating the provenance of the series
should use a dedicated domain name: patch.msgid.link. This should clearly
indicate to you that following this link will take you to the original
submission, not to any other discussion. I haven't yet made this the default
in b4, but I should probably do that.

Anyone can already make this their default by setting the following in their
.gitconfig:

    [b4]
        linkmask = https://patch.msgid.link/%s

-K

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-27 22:49             ` Konstantin Ryabitsev
@ 2025-08-27 23:40               ` Linus Torvalds
  2025-08-28  0:41                 ` Konstantin Ryabitsev
  0 siblings, 1 reply; 320+ messages in thread
From: Linus Torvalds @ 2025-08-27 23:40 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Christian Brauner, Al Viro, linux-fsdevel, Jan Kara

On Wed, 27 Aug 2025 at 15:49, Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> I have recommended that Link: trailers indicating the provenance of the series
> should use a dedicated domain name: patch.msgid.link. This should clearly
> indicate to you that following this link will take you to the original
> submission, not to any other discussion.

That doesn't fix anything. It only reinforces the basic stupidity of
marking the WRONG DIRECTION.

The fact is, YOU CANNOT SANELY MARK THE COMMIT. Dammit, why do people
ignore this *fundamental* issue? You literally cannot add information
to the commit that doesn't exist yet, and the threads that refer to
bugs etc quite fundamentally WILL NOT EXIST YET when the commit is
posted.

The actual *useful* information about a commit is the discussions it
resulted in, not the posting of the patch.

And those will almost invariably be unrelated to the patch submission,
since they either talked about the problems that the patch *fixed*, or
talk about the problems that the patch *caused* (ie the thread starts
with some random "My machine no longer boots", and then goes on from
there as people try to figure out what caused it.

So the *relevant* links are pretty much by definition not the link to
the posting of the patch.

Is it really so hard to understand and accept this fundamental issue?

It's the *message* that should be indexed and marked, not the commit.

What you want to find is messages on the mailing list that mention the
commit, not the other way around. The other way around is completely
pointless and CANNOT BE AUTOMATED. Any automation by definition will
only add noise, not "information".

Really. The only valid link is a link to *pre-existing* discussion,
not to some stupid "this is where I posted this patch", which is
entirely and utterly immaterial.

And dammit, lore could do this. Here's one suggested model that at
least gets the direction of indexing right (I'm not claiming it's the
only model, or the best model, but it sure as hell beats getting the
fundamentals completely wrong):

 (a) messages with patches can be indexed by the patch-id of said patch

This might well be useful in its own right ("search for this patch"),
and would be good for the series where the same patch ends up being
re-posted because the whole series was re-posted.

IOW, just that trivial thing would already allow the lore web
interface to link to "this patch has been posted before", which is
useful information on its own, totally aside from any future
archeology.

But it's not the end goal, it's only a small step to *get* to the end goal:

 (b) messages that mention a commit ID (or a subject line) could then
have referrals to the patch-id of said commit.

No, you don't want to do a whole-text search every time you look for a
commit. That's fine for manual stuff, but it's much too expensive for
any sane automation. But you *can* (and lore already does) scan
messages at message posting time, and find when people refer to a
commit, and then index that message *once* by the patch ID of the
commit.

Now, this *is* fundamentally useful in a very different way: if you
have somebody who bisected something and mentions a commit as a
result, you'd now *find* that kind of message, and the history leading
up to it.

So when people read threads on lore about bugs being bisected, think
how useful it would be if that thread would basically auto-populate
with "this message refers to this patch".

And the final step is

 (c) have some 'b4' infrastructure to look up emails pertaining to a
commit - by doing the patch ID and then looking up the indexing above

Look, now you have a "open web browser with the history of not just
where the patch was originally posted, but where that commit was
*mentioned*".

Notice how fundamentally more useful this is from some link to where
the patch was posted? And absolutely nothing in the above implies
tagging the commit with useless information.

I look at the "Link:" tags quite regularly, and I can tell you that
when it's a posting tag, it almost invariably is completely and
totally useless. We *have* people who add those, and they only add
noise and very little value.

Do not add more of those useless garbage links in the name of
"automation". It's not automating anything useful, it's only
automating garbage.

Because the *commit* already has all the information that is relevant
- it's not the commit that is missing a link. It's the other side.

Which is why those links to lore patch submission events are so
STUPID. They add nothing. Doing them in the name of "automation" is
crazy. It's entirely pointless. It's garbage and it's mis-designed,
because it's not understanding the problem.

                   Linus

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-27 23:40               ` Linus Torvalds
@ 2025-08-28  0:41                 ` Konstantin Ryabitsev
  2025-08-28  1:00                   ` Al Viro
  2025-08-28  1:29                   ` Linus Torvalds
  0 siblings, 2 replies; 320+ messages in thread
From: Konstantin Ryabitsev @ 2025-08-28  0:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Christian Brauner, Al Viro, linux-fsdevel, Jan Kara

On Wed, Aug 27, 2025 at 04:40:58PM -0700, Linus Torvalds wrote:
> On Wed, 27 Aug 2025 at 15:49, Konstantin Ryabitsev
> <konstantin@linuxfoundation.org> wrote:
> >
> > I have recommended that Link: trailers indicating the provenance of the series
> > should use a dedicated domain name: patch.msgid.link. This should clearly
> > indicate to you that following this link will take you to the original
> > submission, not to any other discussion.
> 
> That doesn't fix anything. It only reinforces the basic stupidity of
> marking the WRONG DIRECTION.
> 
> The fact is, YOU CANNOT SANELY MARK THE COMMIT. Dammit, why do people
> ignore this *fundamental* issue? You literally cannot add information
> to the commit that doesn't exist yet, and the threads that refer to
> bugs etc quite fundamentally WILL NOT EXIST YET when the commit is
> posted.

I'm not sure what you mean. The Link: trailer is added when the maintainer
pulls in the series into their tree. It's not put there by the submitter. The
maintainer marks a reliable mapping of "this commit came from this thread" and
we the use this info for multiple purposes:

1. letting the submitter know when their series is accepted into the
   maintainer's tree
2. marking the series as "mainlined" when we find that commit in your tree
3. it reliably marks provenance for tools like cregit, which largely have to
   guess this info

It serves a real purpose.

> It's the *message* that should be indexed and marked, not the commit.

We cannot *reliably* map commits to patches. A commit can be represented as
any number of patches, all resulting in different patch-id's -- it can be
generated with a different number of context lines, with a different patch
algorithm, it could have been rebased, etc. Maintainers do edit patches they
receive, including the subject lines. I know, because attempting to automate
things without a provenance Link: results in false-positives for projects like
netdev.

> Really. The only valid link is a link to *pre-existing* discussion,
> not to some stupid "this is where I posted this patch", which is
> entirely and utterly immaterial.
> 
> And dammit, lore could do this. Here's one suggested model that at
> least gets the direction of indexing right (I'm not claiming it's the
> only model, or the best model, but it sure as hell beats getting the
> fundamentals completely wrong):
> 
>  (a) messages with patches can be indexed by the patch-id of said patch

They already do, it's been there for a long time now. Here's a random one:
https://lore.kernel.org/lkml/?q=patchid%3A09b124c33929efcffe0ce8df0a805f54d5962f60

> This might well be useful in its own right ("search for this patch"),
> and would be good for the series where the same patch ends up being
> re-posted because the whole series was re-posted.

This is how we are able to pull in trailers sent to previous series, if the
patch-id hasn't changed.

> IOW, just that trivial thing would already allow the lore web
> interface to link to "this patch has been posted before", which is
> useful information on its own, totally aside from any future
> archeology.
> 
> But it's not the end goal, it's only a small step to *get* to the end goal:
> 
>  (b) messages that mention a commit ID (or a subject line) could then
> have referrals to the patch-id of said commit.

To reiterate, a commit is not a patch, so *we cannot reliably arrive from a
commit to always the same patch-id*. We've discovered it the hard way when you
recommended that people send you patches with --histogram and we suddenly
could no longer reliably map commits to patches, because on our end we
generated patches with the default (myers) and they did not match the patches
generated with --histogram, so our automation broke.

This is what I am trying to convey -- commits don't reliably map to patches,
because the same commit can generate any number of perfectly valid patches,
all with different patch-id's.

-K

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-28  0:41                 ` Konstantin Ryabitsev
@ 2025-08-28  1:00                   ` Al Viro
  2025-08-28  1:15                     ` Konstantin Ryabitsev
  2025-08-28  1:29                   ` Linus Torvalds
  1 sibling, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28  1:00 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: Linus Torvalds, Christian Brauner, linux-fsdevel, Jan Kara

On Wed, Aug 27, 2025 at 08:41:02PM -0400, Konstantin Ryabitsev wrote:

> I'm not sure what you mean. The Link: trailer is added when the maintainer
> pulls in the series into their tree. It's not put there by the submitter. The
> maintainer marks a reliable mapping of "this commit came from this thread" and
> we the use this info for multiple purposes:

You are overloading the terms here - "pull" as in (basically) git am and "pull"
as in git pull and its ilk...

And I still don't understand how is that supposed to apply when patches are
_developed_ in git branches.  In situation when submitter == maintainer.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-28  1:00                   ` Al Viro
@ 2025-08-28  1:15                     ` Konstantin Ryabitsev
  0 siblings, 0 replies; 320+ messages in thread
From: Konstantin Ryabitsev @ 2025-08-28  1:15 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, Christian Brauner, linux-fsdevel, Jan Kara

On Thu, Aug 28, 2025 at 02:00:17AM +0100, Al Viro wrote:
> > I'm not sure what you mean. The Link: trailer is added when the maintainer
> > pulls in the series into their tree. It's not put there by the submitter. The
> > maintainer marks a reliable mapping of "this commit came from this thread" and
> > we the use this info for multiple purposes:
> 
> You are overloading the terms here - "pull" as in (basically) git am and "pull"
> as in git pull and its ilk...
> 
> And I still don't understand how is that supposed to apply when patches are
> _developed_ in git branches.  In situation when submitter == maintainer.

Then there's no external provenance, so there is no need for this kind of
mapping. You will submit your changes as a pull request and you'll get
notified when it's merged (via the PR tracker bot).

There is a hybrid workflow as well:

- maintainer develops a patch series
- maintainer sends it to the list for review
- maintainer pulls in the trailers

In that case, we don't automatically put provenance trailers into patches, but
you can still achieve the same result if instead of merging your local branch
you merge the series from the list, but this is more of a corner case
scenario.

-K

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-28  0:41                 ` Konstantin Ryabitsev
  2025-08-28  1:00                   ` Al Viro
@ 2025-08-28  1:29                   ` Linus Torvalds
  2025-08-29 12:30                     ` Theodore Ts'o
  1 sibling, 1 reply; 320+ messages in thread
From: Linus Torvalds @ 2025-08-28  1:29 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: Christian Brauner, Al Viro, linux-fsdevel, Jan Kara

On Wed, 27 Aug 2025 at 17:41, Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
>
> I'm not sure what you mean. The Link: trailer is added when the maintainer
> pulls in the series into their tree.

That's my point. Adding it to the commit at that point is entirely
useless, because

 (a) that email doesn't have the *reason* for the patch (or rather, if
it does, then the link to the email is pointless, since the *real*
reason was mentioned already)

 (b) at that point clearly it doesn't have any *problems* associated
with it either, since if it did, it shouldn't have been included in
the first place.

So there is absolutely zero information in the link.

It's pure pointless noise.

> maintainer marks a reliable mapping of "this commit came from this thread" and
>
> It serves a real purpose.

It damn well does not serve any purpose at all, because there is
nothing useful there.

Your logic isn't logic - it's just empty words.

I can come up with tons of "reliable mappings".  How about we make the
automation add the weather.com report for the weather in Kuala Lumpur
when b4 downloads the series? We could do that reliably too.

Notice how the reliability of something is entirely irrelevant. Just
because you can reliably automate it doesn't make it relevant
information.

And dammit, it's WORSE than worthless information. I _constantly_ end
up being disappointed by those useless links, and I've wasted time
following them in the hope of finding something useful.

So it's actually reliably NEGATIVE information that wastes peoples time.

> We cannot *reliably* map commits to patches.

What we care about is about things being *USEFUL*.

"Reliable" is entirely irrelevant if it's not useful.

Because reliable but useless is still useless.

And always will be.

So I'll take "Useful information that you might not always have",
every single time over "Useless, but always there".

Get it?

                    Linus

^ permalink raw reply	[flat|nested] 320+ messages in thread

* [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess
  2025-08-28 23:07 ` [PATCHES v2][RFC][CFT] " Al Viro
@ 2025-08-28 23:07   ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 02/63] introduced guards for mount_lock Al Viro
                       ` (61 more replies)
  2025-09-03  4:54   ` [PATCHES v3][RFC][CFT] mount-related stuff Al Viro
  1 sibling, 62 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

If anything, namespace_lock should be DEFINE_LOCK_GUARD_0, not DEFINE_GUARD.
That way we
	* do not need to feed it a bogus argument
	* do not get gcc trying to compare an address of static in
file variable with -4097 - and, if we are unlucky, trying to keep
it in a register, with spills and all such.

The same problems apply to grabbing namespace_sem shared.

Rename it to namespace_excl, add namespace_shared, convert the existing users:

    guard(namespace_lock, &namespace_sem) => guard(namespace_excl)()
    guard(rwsem_read, &namespace_sem) => guard(namespace_shared)()
    scoped_guard(namespace_lock, &namespace_sem) => scoped_guard(namespace_excl)
    scoped_guard(rwsem_read, &namespace_sem) => scoped_guard(namespace_shared)

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index ae6d1312b184..fcea65587ff9 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -82,6 +82,12 @@ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
 static struct mnt_namespace *emptied_ns; /* protected by namespace_sem */
 static DEFINE_SEQLOCK(mnt_ns_tree_lock);
 
+static inline void namespace_lock(void);
+static void namespace_unlock(void);
+DEFINE_LOCK_GUARD_0(namespace_excl, namespace_lock(), namespace_unlock())
+DEFINE_LOCK_GUARD_0(namespace_shared, down_read(&namespace_sem),
+				      up_read(&namespace_sem))
+
 #ifdef CONFIG_FSNOTIFY
 LIST_HEAD(notify_list); /* protected by namespace_sem */
 #endif
@@ -1776,8 +1782,6 @@ static inline void namespace_lock(void)
 	down_write(&namespace_sem);
 }
 
-DEFINE_GUARD(namespace_lock, struct rw_semaphore *, namespace_lock(), namespace_unlock())
-
 enum umount_tree_flags {
 	UMOUNT_SYNC = 1,
 	UMOUNT_PROPAGATE = 2,
@@ -2306,7 +2310,7 @@ struct path *collect_paths(const struct path *path,
 	struct path *res = prealloc, *to_free = NULL;
 	unsigned n = 0;
 
-	guard(rwsem_read)(&namespace_sem);
+	guard(namespace_shared)();
 
 	if (!check_mnt(root))
 		return ERR_PTR(-EINVAL);
@@ -2361,7 +2365,7 @@ void dissolve_on_fput(struct vfsmount *mnt)
 			return;
 	}
 
-	scoped_guard(namespace_lock, &namespace_sem) {
+	scoped_guard(namespace_excl) {
 		if (!anon_ns_root(m))
 			return;
 
@@ -2435,7 +2439,7 @@ struct vfsmount *clone_private_mount(const struct path *path)
 	struct mount *old_mnt = real_mount(path->mnt);
 	struct mount *new_mnt;
 
-	guard(rwsem_read)(&namespace_sem);
+	guard(namespace_shared)();
 
 	if (IS_MNT_UNBINDABLE(old_mnt))
 		return ERR_PTR(-EINVAL);
@@ -5957,7 +5961,7 @@ SYSCALL_DEFINE4(statmount, const struct mnt_id_req __user *, req,
 	if (ret)
 		return ret;
 
-	scoped_guard(rwsem_read, &namespace_sem)
+	scoped_guard(namespace_shared)
 		ret = do_statmount(ks, kreq.mnt_id, kreq.mnt_ns_id, ns);
 
 	if (!ret)
@@ -6079,7 +6083,7 @@ SYSCALL_DEFINE4(listmount, const struct mnt_id_req __user *, req,
 	 * We only need to guard against mount topology changes as
 	 * listmount() doesn't care about any mount properties.
 	 */
-	scoped_guard(rwsem_read, &namespace_sem)
+	scoped_guard(namespace_shared)
 		ret = do_listmount(ns, kreq.mnt_id, last_mnt_id, kmnt_ids,
 				   nr_mnt_ids, (flags & LISTMOUNT_REVERSE));
 	if (ret <= 0)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 02/63] introduced guards for mount_lock
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-29  9:49       ` Christian Brauner
  2025-08-28 23:07     ` [PATCH v2 03/63] fs/namespace.c: allow to drop vfsmount references via __free(mntput) Al Viro
                       ` (60 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

mount_writer: write_seqlock; that's an equivalent of {un,}lock_mount_hash()
mount_locked_reader: read_seqlock_excl; these tend to be open-coded.

No bulk conversions, please - if nothing else, quite a few places take
use mount_writer form when mount_locked_reader is sufficent.  It needs
to be dealt with carefully.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/mount.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/mount.h b/fs/mount.h
index 97737051a8b9..ed8c83ba836a 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -154,6 +154,11 @@ static inline void get_mnt_ns(struct mnt_namespace *ns)
 
 extern seqlock_t mount_lock;
 
+DEFINE_LOCK_GUARD_0(mount_writer, write_seqlock(&mount_lock),
+		    write_sequnlock(&mount_lock))
+DEFINE_LOCK_GUARD_0(mount_locked_reader, read_seqlock_excl(&mount_lock),
+		    read_sequnlock_excl(&mount_lock))
+
 struct proc_mounts {
 	struct mnt_namespace *ns;
 	struct path root;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCHES v2][RFC][CFT] mount-related stuff
  2025-08-25  4:40 [PATCHED][RFC][CFT] mount-related stuff Al Viro
                   ` (2 preceding siblings ...)
  2025-08-25 12:43 ` Christian Brauner
@ 2025-08-28 23:07 ` Al Viro
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
  2025-09-03  4:54   ` [PATCHES v3][RFC][CFT] mount-related stuff Al Viro
  3 siblings, 2 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Linus Torvalds, Christian Brauner, Jan Kara

Branch force-pushed into
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.mount
(also visible as #v2.mount, #v1.mount being the previous version)
Individual patches in followups.

Still -rc3-based, seems to survive local beating.  Please, help with
review and testing.

Note: no links in commits, I still don't understand what kind of use is
expected in this situation.

Changes since v1 (aside of reviewed-by applied):

	In #13, #14 and #15 scoped_guard replaced with guard.  I don't like
it, but I can live with it.

	Between old #18 and #19: do_new_mount_fc() switched to use of fc_mount().
vfs_get_tree() call moved from the caller into the function itself, unlock +
vfs_create_mount() reordered to before the checks in there and collapsed with
vfs_get_tree() into a call of fc_mount().  Cleanup aside, that avoids the
difference between the lexical scope of mnt and the actual lifetime of that
reference.
	Differs from the variant posted in https://lore.kernel.org/all/20250826182124.GV39973@ZenIV/
only by fixing an obvious braino - fetching fc->root->d_sb should be done after
successful fc_mount(), not before it.
	That change modifies old #25 (now #26) "do_new_mount_rc(): use __free()
to deal with dropping mnt on failure".

	Added to the end of queue: cleanup of populating a new namespace with
a tree (open_detached_copy() and copy_mnt_ns()); both end up using guards, BTW. 
	5 commits, #54..#58
	* open_detached_copy(): don't bother with mount_lock_hash()
It's useless there right now - namespace_excl is quite enough.
	* open_detached_copy(): separate creation of namespace into helper
Creation of namespace and opening that FMODE_NEED_UNMOUNT file are better
off separated - cleaner that way.
	* mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list
Currently it (and free_mnt_ns()) can't be used with non-anon namespace before
the insertion into mnt_ns_tree; very easy to make it work in such situation as
well - in fact, the old "is it non-anonymous" check is not needed anymore.
	* copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure
Use the previous patch to avoid weird open-coding of free_mnt_ns().
	* copy_mnt_ns(): use guards
... and __free(mntput) for rootmnt/pwdmnt.

	Added to the end of queue: handling of ->s_mounts/->mnt_instance and
mnt_hold_writers().
	Each mount is associated with the same dentry (sub)tree of the same
filesystem through its entire lifetime.  They are allocated empty, then (in the
same function that had called allocator) attached to dentry tree and stay like
that all the way to destructor (cleanup_mnt()).
	Unfortunately, as soon as they are attached to a tree, they become
reachable from shared data structures - we maintain the set of all mounts
associated with given superblock.  Having to worry about that while we are
still setting them up is inconvenient.  Thankfully, the accesses via that set
are *very* limited - only sb_prepare_remount_readonly() goes there and the
only thing it does to a mount is setting/clearing MNT_WRITE_HOLD and checking
the write count (guaranteed to be zero during setup, since there's nobody
who could've asked for write access by that point).
	Turns out it's easy to take MNT_WRITE_HOLD out of ->mnt_flags and
basically move it into the same thing that establishes linkage in per-superblock
set of mounts.  That makes accesses via that set isolated from the rest of
struct mount; as far as we are concerned, this set is no longer a way to reach
the mount from shared data structures and mount remains private to caller
until it is explicitly made reachable (by mounting, attaching to overlayfs as
a layer, etc.).
	FWIW, I think we should get rid of the "empty" state of struct mount
and have allocator take the root dentry as additional argument.  Hadn't done
that yet; this series removes the need to delay attaching a partially set up
mount to filesystem - we can do that from the very beginning now.
	5 commits, #59..#63
	* setup_mnt(): primitive for connecting a mount to filesystem
Identical logics in clone_mnt() and vfs_create_mount() => common helper
	* preparations to taking MNT_WRITE_HOLD out of ->mnt_flags
Change the representation of set from list_head list to something equivalent
to hlist one, with forward linkage going to the entire struct mount rather
than embedded hlist_node.
	* struct mount: relocate MNT_WRITE_HOLD bit
Steal the LSB of back links in the set representation to store it.  We only
traverse the list forwards and all changes are under mount_lock, same as
for all mnt_hold_writers()/mnt_unhold_writers() pairs, so it's pretty
uncomplicated.
	* simplify the callers of mnt_unhold_writers()
	* WRITE_HOLD machinery: no need for to bump mount_lock seqcount
The last part is another group of "we only need mount_locked_reader" cases

Diffstat:
 fs/ecryptfs/dentry.c          |  14 +-
 fs/ecryptfs/ecryptfs_kernel.h |  27 +-
 fs/ecryptfs/file.c            |  15 +-
 fs/ecryptfs/inode.c           |  19 +-
 fs/ecryptfs/main.c            |  24 +-
 fs/internal.h                 |   4 +-
 fs/mount.h                    |  16 +-
 fs/namespace.c                | 989 +++++++++++++++++++-----------------------
 fs/pnode.c                    |  75 +++-
 fs/pnode.h                    |   1 +
 fs/super.c                    |   3 +-
 include/linux/fs.h            |   2 +-
 include/linux/mount.h         |   7 +-
 kernel/audit_tree.c           |  12 +-
 14 files changed, 573 insertions(+), 635 deletions(-)

^ permalink raw reply	[flat|nested] 320+ messages in thread

* [PATCH v2 03/63] fs/namespace.c: allow to drop vfsmount references via __free(mntput)
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
  2025-08-28 23:07     ` [PATCH v2 02/63] introduced guards for mount_lock Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 04/63] __detach_mounts(): use guards Al Viro
                       ` (59 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Note that just as path_put, it should never be done in scope of
namespace_sem, be it shared or exclusive.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index fcea65587ff9..767ab751ee2a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -88,6 +88,8 @@ DEFINE_LOCK_GUARD_0(namespace_excl, namespace_lock(), namespace_unlock())
 DEFINE_LOCK_GUARD_0(namespace_shared, down_read(&namespace_sem),
 				      up_read(&namespace_sem))
 
+DEFINE_FREE(mntput, struct vfsmount *, if (!IS_ERR(_T)) mntput(_T))
+
 #ifdef CONFIG_FSNOTIFY
 LIST_HEAD(notify_list); /* protected by namespace_sem */
 #endif
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 04/63] __detach_mounts(): use guards
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
  2025-08-28 23:07     ` [PATCH v2 02/63] introduced guards for mount_lock Al Viro
  2025-08-28 23:07     ` [PATCH v2 03/63] fs/namespace.c: allow to drop vfsmount references via __free(mntput) Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-29  9:48       ` Christian Brauner
  2025-08-28 23:07     ` [PATCH v2 05/63] __is_local_mountpoint(): " Al Viro
                       ` (58 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Clean fit for guards use; guards can't be weaker due to umount_tree() calls.
---
 fs/namespace.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 767ab751ee2a..1ae1ab8815c9 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2032,10 +2032,11 @@ void __detach_mounts(struct dentry *dentry)
 	struct pinned_mountpoint mp = {};
 	struct mount *mnt;
 
-	namespace_lock();
-	lock_mount_hash();
+	guard(namespace_excl)();
+	guard(mount_writer)();
+
 	if (!lookup_mountpoint(dentry, &mp))
-		goto out_unlock;
+		return;
 
 	event++;
 	while (mp.node.next) {
@@ -2047,9 +2048,6 @@ void __detach_mounts(struct dentry *dentry)
 		else umount_tree(mnt, UMOUNT_CONNECTED);
 	}
 	unpin_mountpoint(&mp);
-out_unlock:
-	unlock_mount_hash();
-	namespace_unlock();
 }
 
 /*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 05/63] __is_local_mountpoint(): use guards
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (2 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 04/63] __detach_mounts(): use guards Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 06/63] do_change_type(): " Al Viro
                       ` (57 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

clean fit; namespace_shared due to iterating through ns->mounts.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 1ae1ab8815c9..f1460ddd1486 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -906,17 +906,14 @@ bool __is_local_mountpoint(const struct dentry *dentry)
 {
 	struct mnt_namespace *ns = current->nsproxy->mnt_ns;
 	struct mount *mnt, *n;
-	bool is_covered = false;
 
-	down_read(&namespace_sem);
-	rbtree_postorder_for_each_entry_safe(mnt, n, &ns->mounts, mnt_node) {
-		is_covered = (mnt->mnt_mountpoint == dentry);
-		if (is_covered)
-			break;
-	}
-	up_read(&namespace_sem);
+	guard(namespace_shared)();
+
+	rbtree_postorder_for_each_entry_safe(mnt, n, &ns->mounts, mnt_node)
+		if (mnt->mnt_mountpoint == dentry)
+			return true;
 
-	return is_covered;
+	return false;
 }
 
 struct pinned_mountpoint {
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 06/63] do_change_type(): use guards
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (3 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 05/63] __is_local_mountpoint(): " Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 07/63] do_set_group(): " Al Viro
                       ` (56 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

clean fit; namespace_excl to modify propagation graph

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index f1460ddd1486..a6a7b068770a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2899,7 +2899,7 @@ static int do_change_type(struct path *path, int ms_flags)
 	struct mount *mnt = real_mount(path->mnt);
 	int recurse = ms_flags & MS_REC;
 	int type;
-	int err = 0;
+	int err;
 
 	if (!path_mounted(path))
 		return -EINVAL;
@@ -2908,23 +2908,22 @@ static int do_change_type(struct path *path, int ms_flags)
 	if (!type)
 		return -EINVAL;
 
-	namespace_lock();
+	guard(namespace_excl)();
+
 	err = may_change_propagation(mnt);
 	if (err)
-		goto out_unlock;
+		return err;
 
 	if (type == MS_SHARED) {
 		err = invent_group_ids(mnt, recurse);
 		if (err)
-			goto out_unlock;
+			return err;
 	}
 
 	for (m = mnt; m; m = (recurse ? next_mnt(m, mnt) : NULL))
 		change_mnt_propagation(m, type);
 
- out_unlock:
-	namespace_unlock();
-	return err;
+	return 0;
 }
 
 /* may_copy_tree() - check if a mount tree can be copied
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 07/63] do_set_group(): use guards
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (4 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 06/63] do_change_type(): " Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 08/63] mark_mounts_for_expiry(): " Al Viro
                       ` (55 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

clean fit; namespace_excl to modify propagation graph

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a6a7b068770a..13e2f3837a26 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3349,47 +3349,44 @@ static inline int tree_contains_unbindable(struct mount *mnt)
 
 static int do_set_group(struct path *from_path, struct path *to_path)
 {
-	struct mount *from, *to;
+	struct mount *from = real_mount(from_path->mnt);
+	struct mount *to = real_mount(to_path->mnt);
 	int err;
 
-	from = real_mount(from_path->mnt);
-	to = real_mount(to_path->mnt);
-
-	namespace_lock();
+	guard(namespace_excl)();
 
 	err = may_change_propagation(from);
 	if (err)
-		goto out;
+		return err;
 	err = may_change_propagation(to);
 	if (err)
-		goto out;
+		return err;
 
-	err = -EINVAL;
 	/* To and From paths should be mount roots */
 	if (!path_mounted(from_path))
-		goto out;
+		return -EINVAL;
 	if (!path_mounted(to_path))
-		goto out;
+		return -EINVAL;
 
 	/* Setting sharing groups is only allowed across same superblock */
 	if (from->mnt.mnt_sb != to->mnt.mnt_sb)
-		goto out;
+		return -EINVAL;
 
 	/* From mount root should be wider than To mount root */
 	if (!is_subdir(to->mnt.mnt_root, from->mnt.mnt_root))
-		goto out;
+		return -EINVAL;
 
 	/* From mount should not have locked children in place of To's root */
 	if (__has_locked_children(from, to->mnt.mnt_root))
-		goto out;
+		return -EINVAL;
 
 	/* Setting sharing groups is only allowed on private mounts */
 	if (IS_MNT_SHARED(to) || IS_MNT_SLAVE(to))
-		goto out;
+		return -EINVAL;
 
 	/* From should not be private */
 	if (!IS_MNT_SHARED(from) && !IS_MNT_SLAVE(from))
-		goto out;
+		return -EINVAL;
 
 	if (IS_MNT_SLAVE(from)) {
 		hlist_add_behind(&to->mnt_slave, &from->mnt_slave);
@@ -3401,11 +3398,7 @@ static int do_set_group(struct path *from_path, struct path *to_path)
 		list_add(&to->mnt_share, &from->mnt_share);
 		set_mnt_shared(to);
 	}
-
-	err = 0;
-out:
-	namespace_unlock();
-	return err;
+	return 0;
 }
 
 /**
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 08/63] mark_mounts_for_expiry(): use guards
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (5 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 07/63] do_set_group(): " Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 09/63] put_mnt_ns(): " Al Viro
                       ` (54 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Clean fit; guards can't be weaker due to umount_tree() calls.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 13e2f3837a26..898a6b7307e4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3886,8 +3886,8 @@ void mark_mounts_for_expiry(struct list_head *mounts)
 	if (list_empty(mounts))
 		return;
 
-	namespace_lock();
-	lock_mount_hash();
+	guard(namespace_excl)();
+	guard(mount_writer)();
 
 	/* extract from the expiration list every vfsmount that matches the
 	 * following criteria:
@@ -3909,8 +3909,6 @@ void mark_mounts_for_expiry(struct list_head *mounts)
 		touch_mnt_namespace(mnt->mnt_ns);
 		umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC);
 	}
-	unlock_mount_hash();
-	namespace_unlock();
 }
 
 EXPORT_SYMBOL_GPL(mark_mounts_for_expiry);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 09/63] put_mnt_ns(): use guards
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (6 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 08/63] mark_mounts_for_expiry(): " Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 10/63] mnt_already_visible(): " Al Viro
                       ` (53 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

clean fit; guards can't be weaker due to umount_tree() call.
Setting emptied_ns requires namespace_excl, but not anything
mount_lock-related.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 898a6b7307e4..86a86be2b0ef 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6153,12 +6153,10 @@ void put_mnt_ns(struct mnt_namespace *ns)
 {
 	if (!refcount_dec_and_test(&ns->ns.count))
 		return;
-	namespace_lock();
+	guard(namespace_excl)();
 	emptied_ns = ns;
-	lock_mount_hash();
+	guard(mount_writer)();
 	umount_tree(ns->root, 0);
-	unlock_mount_hash();
-	namespace_unlock();
 }
 
 struct vfsmount *kern_mount(struct file_system_type *type)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 10/63] mnt_already_visible(): use guards
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (7 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 09/63] put_mnt_ns(): " Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 11/63] check_for_nsfs_mounts(): no need to take locks Al Viro
                       ` (52 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

clean fit; namespace_shared due to iterating through ns->mounts.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 86a86be2b0ef..a5d37b97088f 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6232,9 +6232,8 @@ static bool mnt_already_visible(struct mnt_namespace *ns,
 {
 	int new_flags = *new_mnt_flags;
 	struct mount *mnt, *n;
-	bool visible = false;
 
-	down_read(&namespace_sem);
+	guard(namespace_shared)();
 	rbtree_postorder_for_each_entry_safe(mnt, n, &ns->mounts, mnt_node) {
 		struct mount *child;
 		int mnt_flags;
@@ -6281,13 +6280,10 @@ static bool mnt_already_visible(struct mnt_namespace *ns,
 		/* Preserve the locked attributes */
 		*new_mnt_flags |= mnt_flags & (MNT_LOCK_READONLY | \
 					       MNT_LOCK_ATIME);
-		visible = true;
-		goto found;
+		return true;
 	next:	;
 	}
-found:
-	up_read(&namespace_sem);
-	return visible;
+	return false;
 }
 
 static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 11/63] check_for_nsfs_mounts(): no need to take locks
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (8 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 10/63] mnt_already_visible(): " Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 12/63] propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint() Al Viro
                       ` (51 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Currently we are taking mount_writer; what that function needs is
either mount_locked_reader (we are not changing anything, we just
want to iterate through the subtree) or namespace_shared and
a reference held by caller on the root of subtree - that's also
enough to stabilize the topology.

The thing is, all callers are already holding at least namespace_shared
as well as a reference to the root of subtree.

Let's make the callers provide locking warranties - don't mess with
mount_lock in check_for_nsfs_mounts() itself and document the locking
requirements.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a5d37b97088f..59948cbf9c47 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2402,21 +2402,15 @@ bool has_locked_children(struct mount *mnt, struct dentry *dentry)
  * specified subtree.  Such references can act as pins for mount namespaces
  * that aren't checked by the mount-cycle checking code, thereby allowing
  * cycles to be made.
+ *
+ * locks: mount_locked_reader || namespace_shared && pinned(subtree)
  */
 static bool check_for_nsfs_mounts(struct mount *subtree)
 {
-	struct mount *p;
-	bool ret = false;
-
-	lock_mount_hash();
-	for (p = subtree; p; p = next_mnt(p, subtree))
+	for (struct mount *p = subtree; p; p = next_mnt(p, subtree))
 		if (mnt_ns_loop(p->mnt.mnt_root))
-			goto out;
-
-	ret = true;
-out:
-	unlock_mount_hash();
-	return ret;
+			return false;
+	return true;
 }
 
 /**
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 12/63] propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint()
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (9 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 11/63] check_for_nsfs_mounts(): no need to take locks Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 13/63] has_locked_children(): use guards Al Viro
                       ` (50 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/pnode.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/pnode.c b/fs/pnode.c
index 6f7d02f3fa98..0702d45d856d 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -304,9 +304,8 @@ int propagate_mnt(struct mount *dest_mnt, struct mountpoint *dest_mp,
 				err = PTR_ERR(this);
 				break;
 			}
-			read_seqlock_excl(&mount_lock);
-			mnt_set_mountpoint(n, dest_mp, this);
-			read_sequnlock_excl(&mount_lock);
+			scoped_guard(mount_locked_reader)
+				mnt_set_mountpoint(n, dest_mp, this);
 			if (n->mnt_master)
 				SET_MNT_MARK(n->mnt_master);
 			copy = this;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 13/63] has_locked_children(): use guards
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (10 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 12/63] propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint() Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-29  9:49       ` Christian Brauner
  2025-08-28 23:07     ` [PATCH v2 14/63] mnt_set_expiry(): " Al Viro
                       ` (49 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... and document the locking requirements of __has_locked_children()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 59948cbf9c47..2cb3cb8307ca 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2373,6 +2373,7 @@ void dissolve_on_fput(struct vfsmount *mnt)
 	}
 }
 
+/* locks: namespace_shared && pinned(mnt) || mount_locked_reader */
 static bool __has_locked_children(struct mount *mnt, struct dentry *dentry)
 {
 	struct mount *child;
@@ -2389,12 +2390,8 @@ static bool __has_locked_children(struct mount *mnt, struct dentry *dentry)
 
 bool has_locked_children(struct mount *mnt, struct dentry *dentry)
 {
-	bool res;
-
-	read_seqlock_excl(&mount_lock);
-	res = __has_locked_children(mnt, dentry);
-	read_sequnlock_excl(&mount_lock);
-	return res;
+	guard(mount_locked_reader)();
+	return __has_locked_children(mnt, dentry);
 }
 
 /*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 14/63] mnt_set_expiry(): use guards
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (11 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 13/63] has_locked_children(): use guards Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-29  9:49       ` Christian Brauner
  2025-08-28 23:07     ` [PATCH v2 15/63] path_is_under(): " Al Viro
                       ` (48 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

The reason why it needs only mount_locked_reader is that there's no lockless
accesses of expiry lists.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2cb3cb8307ca..db25c81d7f68 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3858,9 +3858,8 @@ int finish_automount(struct vfsmount *m, const struct path *path)
  */
 void mnt_set_expiry(struct vfsmount *mnt, struct list_head *expiry_list)
 {
-	read_seqlock_excl(&mount_lock);
+	guard(mount_locked_reader)();
 	list_add_tail(&real_mount(mnt)->mnt_expire, expiry_list);
-	read_sequnlock_excl(&mount_lock);
 }
 EXPORT_SYMBOL(mnt_set_expiry);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 15/63] path_is_under(): use guards
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (12 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 14/63] mnt_set_expiry(): " Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 16/63] current_chrooted(): don't bother with follow_down_one() Al Viro
                       ` (47 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... and document that locking requirements for is_path_reachable().
There is one questionable caller in do_listmount() where we are not
holding mount_lock *and* might not have the first argument mounted.
However, in that case it will immediately return true without having
to look at the ancestors.  Might be cleaner to move the check into
non-LSTM_ROOT case which it really belongs in - there the check is
not always true and is_mounted() is guaranteed.

Document the locking environments for is_path_reachable() callers:
	get_peer_under_root()
	get_dominating_id()
	do_statmount()
	do_listmount()

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 11 +++++------
 fs/pnode.c     |  3 ++-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index db25c81d7f68..6aabf0045389 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4592,7 +4592,7 @@ SYSCALL_DEFINE5(move_mount,
 /*
  * Return true if path is reachable from root
  *
- * namespace_sem or mount_lock is held
+ * locks: mount_locked_reader || namespace_shared && is_mounted(mnt)
  */
 bool is_path_reachable(struct mount *mnt, struct dentry *dentry,
 			 const struct path *root)
@@ -4606,11 +4606,8 @@ bool is_path_reachable(struct mount *mnt, struct dentry *dentry,
 
 bool path_is_under(const struct path *path1, const struct path *path2)
 {
-	bool res;
-	read_seqlock_excl(&mount_lock);
-	res = is_path_reachable(real_mount(path1->mnt), path1->dentry, path2);
-	read_sequnlock_excl(&mount_lock);
-	return res;
+	guard(mount_locked_reader)();
+	return is_path_reachable(real_mount(path1->mnt), path1->dentry, path2);
 }
 EXPORT_SYMBOL(path_is_under);
 
@@ -5689,6 +5686,7 @@ static int grab_requested_root(struct mnt_namespace *ns, struct path *root)
 			     STATMOUNT_MNT_UIDMAP | \
 			     STATMOUNT_MNT_GIDMAP)
 
+/* locks: namespace_shared */
 static int do_statmount(struct kstatmount *s, u64 mnt_id, u64 mnt_ns_id,
 			struct mnt_namespace *ns)
 {
@@ -5949,6 +5947,7 @@ SYSCALL_DEFINE4(statmount, const struct mnt_id_req __user *, req,
 	return ret;
 }
 
+/* locks: namespace_shared */
 static ssize_t do_listmount(struct mnt_namespace *ns, u64 mnt_parent_id,
 			    u64 last_mnt_id, u64 *mnt_ids, size_t nr_mnt_ids,
 			    bool reverse)
diff --git a/fs/pnode.c b/fs/pnode.c
index 0702d45d856d..edaf9d9d0eaf 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -29,6 +29,7 @@ static inline struct mount *next_slave(struct mount *p)
 	return hlist_entry(p->mnt_slave.next, struct mount, mnt_slave);
 }
 
+/* locks: namespace_shared && is_mounted(mnt) */
 static struct mount *get_peer_under_root(struct mount *mnt,
 					 struct mnt_namespace *ns,
 					 const struct path *root)
@@ -50,7 +51,7 @@ static struct mount *get_peer_under_root(struct mount *mnt,
  * Get ID of closest dominating peer group having a representative
  * under the given root.
  *
- * Caller must hold namespace_sem
+ * locks: namespace_shared
  */
 int get_dominating_id(struct mount *mnt, const struct path *root)
 {
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 16/63] current_chrooted(): don't bother with follow_down_one()
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (13 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 15/63] path_is_under(): " Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 17/63] current_chrooted(): use guards Al Viro
                       ` (46 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

All we need here is to follow ->overmount on root mount of namespace...

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 6aabf0045389..cf680fbf015e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6194,24 +6194,22 @@ bool our_mnt(struct vfsmount *mnt)
 bool current_chrooted(void)
 {
 	/* Does the current process have a non-standard root */
-	struct path ns_root;
+	struct mount *root = current->nsproxy->mnt_ns->root;
 	struct path fs_root;
 	bool chrooted;
 
+	get_fs_root(current->fs, &fs_root);
+
 	/* Find the namespace root */
-	ns_root.mnt = &current->nsproxy->mnt_ns->root->mnt;
-	ns_root.dentry = ns_root.mnt->mnt_root;
-	path_get(&ns_root);
-	while (d_mountpoint(ns_root.dentry) && follow_down_one(&ns_root))
-		;
+	read_seqlock_excl(&mount_lock);
 
-	get_fs_root(current->fs, &fs_root);
+	while (unlikely(root->overmount))
+		root = root->overmount;
 
-	chrooted = !path_equal(&fs_root, &ns_root);
+	chrooted = fs_root.mnt != &root->mnt || !path_mounted(&fs_root);
 
+	read_sequnlock_excl(&mount_lock);
 	path_put(&fs_root);
-	path_put(&ns_root);
-
 	return chrooted;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 17/63] current_chrooted(): use guards
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (14 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 16/63] current_chrooted(): don't bother with follow_down_one() Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 18/63] switch do_new_mount_fc() to fc_mount() Al Viro
                       ` (45 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

here a use of __free(path_put) for dropping fs_root is enough to
make guard(mount_locked_reader) fit...

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index cf680fbf015e..0474b3a93dbf 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6194,23 +6194,20 @@ bool our_mnt(struct vfsmount *mnt)
 bool current_chrooted(void)
 {
 	/* Does the current process have a non-standard root */
-	struct mount *root = current->nsproxy->mnt_ns->root;
-	struct path fs_root;
-	bool chrooted;
+	struct path fs_root __free(path_put) = {};
+	struct mount *root;
 
 	get_fs_root(current->fs, &fs_root);
 
 	/* Find the namespace root */
-	read_seqlock_excl(&mount_lock);
 
+	guard(mount_locked_reader)();
+
+	root = current->nsproxy->mnt_ns->root;
 	while (unlikely(root->overmount))
 		root = root->overmount;
 
-	chrooted = fs_root.mnt != &root->mnt || !path_mounted(&fs_root);
-
-	read_sequnlock_excl(&mount_lock);
-	path_put(&fs_root);
-	return chrooted;
+	return fs_root.mnt != &root->mnt || !path_mounted(&fs_root);
 }
 
 static bool mnt_already_visible(struct mnt_namespace *ns,
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 18/63] switch do_new_mount_fc() to fc_mount()
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (15 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 17/63] current_chrooted(): use guards Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-29  9:53       ` Christian Brauner
  2025-08-28 23:07     ` [PATCH v2 19/63] do_move_mount(): trim local variables Al Viro
                       ` (44 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Prior to the call of do_new_mount_fc() the caller has just done successful
vfs_get_tree().  Then do_new_mount_fc() does several checks on resulting
superblock, and either does fc_drop_locked() and returns an error or
proceeds to unlock the superblock and call vfs_create_mount().

The thing is, there's no reason to delay that unlock + vfs_create_mount() -
the tests do not rely upon the state of ->s_umount and
	fc_drop_locked()
	put_fs_context()
is equivalent to
	unlock ->s_umount
	put_fs_context()

Doing vfs_create_mount() before the checks allows us to move vfs_get_tree()
from caller to do_new_mount_fc() and collapse it with vfs_create_mount()
into an fc_mount() call.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 29 ++++++++++++-----------------
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 0474b3a93dbf..9b575c9eee0b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3705,25 +3705,20 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
 static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 			   unsigned int mnt_flags)
 {
-	struct vfsmount *mnt;
 	struct pinned_mountpoint mp = {};
-	struct super_block *sb = fc->root->d_sb;
+	struct super_block *sb;
+	struct vfsmount *mnt = fc_mount(fc);
 	int error;
 
+	if (IS_ERR(mnt))
+		return PTR_ERR(mnt);
+
+	sb = fc->root->d_sb;
 	error = security_sb_kern_mount(sb);
 	if (!error && mount_too_revealing(sb, &mnt_flags))
 		error = -EPERM;
-
-	if (unlikely(error)) {
-		fc_drop_locked(fc);
-		return error;
-	}
-
-	up_write(&sb->s_umount);
-
-	mnt = vfs_create_mount(fc);
-	if (IS_ERR(mnt))
-		return PTR_ERR(mnt);
+	if (unlikely(error))
+		goto out;
 
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
@@ -3731,10 +3726,12 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 	if (!error) {
 		error = do_add_mount(real_mount(mnt), mp.mp,
 				     mountpoint, mnt_flags);
+		if (!error)
+			mnt = NULL;	// consumed on success
 		unlock_mount(&mp);
 	}
-	if (error < 0)
-		mntput(mnt);
+out:
+	mntput(mnt);
 	return error;
 }
 
@@ -3788,8 +3785,6 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 		err = parse_monolithic_mount_data(fc, data);
 	if (!err && !mount_capable(fc))
 		err = -EPERM;
-	if (!err)
-		err = vfs_get_tree(fc);
 	if (!err)
 		err = do_new_mount_fc(fc, path, mnt_flags);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 19/63] do_move_mount(): trim local variables
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (16 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 18/63] switch do_new_mount_fc() to fc_mount() Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 20/63] do_move_mount(): deal with the checks on old_path early Al Viro
                       ` (43 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Both 'parent' and 'ns' are used at most once, no point precalculating those...

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 9b575c9eee0b..ad9b5687ff15 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3564,10 +3564,8 @@ static inline bool may_use_mount(struct mount *mnt)
 static int do_move_mount(struct path *old_path,
 			 struct path *new_path, enum mnt_tree_flags_t flags)
 {
-	struct mnt_namespace *ns;
 	struct mount *p;
 	struct mount *old;
-	struct mount *parent;
 	struct pinned_mountpoint mp;
 	int err;
 	bool beneath = flags & MNT_TREE_BENEATH;
@@ -3578,8 +3576,6 @@ static int do_move_mount(struct path *old_path,
 
 	old = real_mount(old_path->mnt);
 	p = real_mount(new_path->mnt);
-	parent = old->mnt_parent;
-	ns = old->mnt_ns;
 
 	err = -EINVAL;
 
@@ -3588,12 +3584,12 @@ static int do_move_mount(struct path *old_path,
 		/* ... it should be detachable from parent */
 		if (!mnt_has_parent(old) || IS_MNT_LOCKED(old))
 			goto out;
+		/* ... which should not be shared */
+		if (IS_MNT_SHARED(old->mnt_parent))
+			goto out;
 		/* ... and the target should be in our namespace */
 		if (!check_mnt(p))
 			goto out;
-		/* parent of the source should not be shared */
-		if (IS_MNT_SHARED(parent))
-			goto out;
 	} else {
 		/*
 		 * otherwise the source must be the root of some anon namespace.
@@ -3605,7 +3601,7 @@ static int do_move_mount(struct path *old_path,
 		 * subsequent checks would've rejected that, but they lose
 		 * some corner cases if we check it early.
 		 */
-		if (ns == p->mnt_ns)
+		if (old->mnt_ns == p->mnt_ns)
 			goto out;
 		/*
 		 * Target should be either in our namespace or in an acceptable
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 20/63] do_move_mount(): deal with the checks on old_path early
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (17 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 19/63] do_move_mount(): trim local variables Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 21/63] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() Al Viro
                       ` (42 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

1) checking that location we want to move does point to root of some mount
can be done before anything else; that property is not going to change
and having it already verified simplifies the analysis.

2) checking the type agreement between what we are trying to move and what
we are trying to move it onto also belongs in the very beginning -
do_lock_mount() might end up switching new_path to something that overmounts
the original location, but... the same type agreement applies to overmounts,
so we could just as well check against the original location.

3) since we know that old_path->dentry is the root of old_path->mnt, there's
no point bothering with path_is_overmounted() in can_move_mount_beneath();
it's simply a check for the mount we are trying to move having non-NULL
->overmount.  And with that, we can switch can_move_mount_beneath() to
taking old instead of old_path, leaving no uses of old_path past the original
checks.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index ad9b5687ff15..74c67ea1b5a8 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3433,7 +3433,7 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
 
 /**
  * can_move_mount_beneath - check that we can mount beneath the top mount
- * @from: mount to mount beneath
+ * @mnt_from: mount we are trying to move
  * @to:   mount under which to mount
  * @mp:   mountpoint of @to
  *
@@ -3443,7 +3443,7 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  *   root or the rootfs of the namespace.
  * - Make sure that the caller can unmount the topmost mount ensuring
  *   that the caller could reveal the underlying mountpoint.
- * - Ensure that nothing has been mounted on top of @from before we
+ * - Ensure that nothing has been mounted on top of @mnt_from before we
  *   grabbed @namespace_sem to avoid creating pointless shadow mounts.
  * - Prevent mounting beneath a mount if the propagation relationship
  *   between the source mount, parent mount, and top mount would lead to
@@ -3452,12 +3452,11 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  * Context: This function expects namespace_lock() to be held.
  * Return: On success 0, and on error a negative error code is returned.
  */
-static int can_move_mount_beneath(const struct path *from,
+static int can_move_mount_beneath(struct mount *mnt_from,
 				  const struct path *to,
 				  const struct mountpoint *mp)
 {
-	struct mount *mnt_from = real_mount(from->mnt),
-		     *mnt_to = real_mount(to->mnt),
+	struct mount *mnt_to = real_mount(to->mnt),
 		     *parent_mnt_to = mnt_to->mnt_parent;
 
 	if (!mnt_has_parent(mnt_to))
@@ -3470,7 +3469,7 @@ static int can_move_mount_beneath(const struct path *from,
 		return -EINVAL;
 
 	/* Avoid creating shadow mounts during mount propagation. */
-	if (path_overmounted(from))
+	if (mnt_from->overmount)
 		return -EINVAL;
 
 	/*
@@ -3565,16 +3564,21 @@ static int do_move_mount(struct path *old_path,
 			 struct path *new_path, enum mnt_tree_flags_t flags)
 {
 	struct mount *p;
-	struct mount *old;
+	struct mount *old = real_mount(old_path->mnt);
 	struct pinned_mountpoint mp;
 	int err;
 	bool beneath = flags & MNT_TREE_BENEATH;
 
+	if (!path_mounted(old_path))
+		return -EINVAL;
+
+	if (d_is_dir(new_path->dentry) != d_is_dir(old_path->dentry))
+		return -EINVAL;
+
 	err = do_lock_mount(new_path, &mp, beneath);
 	if (err)
 		return err;
 
-	old = real_mount(old_path->mnt);
 	p = real_mount(new_path->mnt);
 
 	err = -EINVAL;
@@ -3611,15 +3615,8 @@ static int do_move_mount(struct path *old_path,
 			goto out;
 	}
 
-	if (!path_mounted(old_path))
-		goto out;
-
-	if (d_is_dir(new_path->dentry) !=
-	    d_is_dir(old_path->dentry))
-		goto out;
-
 	if (beneath) {
-		err = can_move_mount_beneath(old_path, new_path, mp.mp);
+		err = can_move_mount_beneath(old, new_path, mp.mp);
 		if (err)
 			goto out;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 21/63] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount()
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (18 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 20/63] do_move_mount(): deal with the checks on old_path early Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 22/63] finish_automount(): simplify the ELOOP check Al Viro
                       ` (41 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

We want to mount beneath the given location.  For that operation to
make sense, location must be the root of some mount that has something
under it.  Currently we let it proceed if those requirements are not met,
with rather meaningless results, and have that bogosity caught further
down the road; let's fail early instead - do_lock_mount() doesn't make
sense unless those conditions hold, and checking them there makes
things simpler.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 74c67ea1b5a8..86c6dd432b13 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2768,12 +2768,19 @@ static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bo
 	struct path under = {};
 	int err = -ENOENT;
 
+	if (unlikely(beneath) && !path_mounted(path))
+		return -EINVAL;
+
 	for (;;) {
 		struct mount *m = real_mount(mnt);
 
 		if (beneath) {
 			path_put(&under);
 			read_seqlock_excl(&mount_lock);
+			if (unlikely(!mnt_has_parent(m))) {
+				read_sequnlock_excl(&mount_lock);
+				return -EINVAL;
+			}
 			under.mnt = mntget(&m->mnt_parent->mnt);
 			under.dentry = dget(m->mnt_mountpoint);
 			read_sequnlock_excl(&mount_lock);
@@ -3437,8 +3444,6 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  * @to:   mount under which to mount
  * @mp:   mountpoint of @to
  *
- * - Make sure that @to->dentry is actually the root of a mount under
- *   which we can mount another mount.
  * - Make sure that nothing can be mounted beneath the caller's current
  *   root or the rootfs of the namespace.
  * - Make sure that the caller can unmount the topmost mount ensuring
@@ -3459,12 +3464,6 @@ static int can_move_mount_beneath(struct mount *mnt_from,
 	struct mount *mnt_to = real_mount(to->mnt),
 		     *parent_mnt_to = mnt_to->mnt_parent;
 
-	if (!mnt_has_parent(mnt_to))
-		return -EINVAL;
-
-	if (!path_mounted(to))
-		return -EINVAL;
-
 	if (IS_MNT_LOCKED(mnt_to))
 		return -EINVAL;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 22/63] finish_automount(): simplify the ELOOP check
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (19 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 21/63] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 23/63] do_loopback(): use __free(path_put) to deal with old_path Al Viro
                       ` (40 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

It's enough to check that dentries match; if path->dentry is equal to
m->mnt_root, superblocks will match as well.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 86c6dd432b13..bdb33270ac6e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3798,8 +3798,7 @@ int finish_automount(struct vfsmount *m, const struct path *path)
 
 	mnt = real_mount(m);
 
-	if (m->mnt_sb == path->mnt->mnt_sb &&
-	    m->mnt_root == dentry) {
+	if (m->mnt_root == path->dentry) {
 		err = -ELOOP;
 		goto discard;
 	}
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 23/63] do_loopback(): use __free(path_put) to deal with old_path
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (20 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 22/63] finish_automount(): simplify the ELOOP check Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 24/63] pivot_root(2): use __free() to deal with struct path in it Al Viro
                       ` (39 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

preparations for making unlock_mount() a __cleanup();
can't have path_put() inside mount_lock scope.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index bdb33270ac6e..245cf2d19a6b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3014,7 +3014,7 @@ static struct mount *__do_loopback(struct path *old_path, int recurse)
 static int do_loopback(struct path *path, const char *old_name,
 				int recurse)
 {
-	struct path old_path;
+	struct path old_path __free(path_put) = {};
 	struct mount *mnt = NULL, *parent;
 	struct pinned_mountpoint mp = {};
 	int err;
@@ -3024,13 +3024,12 @@ static int do_loopback(struct path *path, const char *old_name,
 	if (err)
 		return err;
 
-	err = -EINVAL;
 	if (mnt_ns_loop(old_path.dentry))
-		goto out;
+		return -EINVAL;
 
 	err = lock_mount(path, &mp);
 	if (err)
-		goto out;
+		return err;
 
 	parent = real_mount(path->mnt);
 	if (!check_mnt(parent))
@@ -3050,8 +3049,6 @@ static int do_loopback(struct path *path, const char *old_name,
 	}
 out2:
 	unlock_mount(&mp);
-out:
-	path_put(&old_path);
 	return err;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 24/63] pivot_root(2): use __free() to deal with struct path in it
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (21 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 23/63] do_loopback(): use __free(path_put) to deal with old_path Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 25/63] finish_automount(): take the lock_mount() analogue into a helper Al Viro
                       ` (38 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

preparations for making unlock_mount() a __cleanup();
can't have path_put() inside mount_lock scope.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 245cf2d19a6b..90b62ee882da 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4622,7 +4622,9 @@ EXPORT_SYMBOL(path_is_under);
 SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 		const char __user *, put_old)
 {
-	struct path new, old, root;
+	struct path new __free(path_put) = {};
+	struct path old __free(path_put) = {};
+	struct path root __free(path_put) = {};
 	struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent, *ex_parent;
 	struct pinned_mountpoint old_mp = {};
 	int error;
@@ -4633,21 +4635,21 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	error = user_path_at(AT_FDCWD, new_root,
 			     LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &new);
 	if (error)
-		goto out0;
+		return error;
 
 	error = user_path_at(AT_FDCWD, put_old,
 			     LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &old);
 	if (error)
-		goto out1;
+		return error;
 
 	error = security_sb_pivotroot(&old, &new);
 	if (error)
-		goto out2;
+		return error;
 
 	get_fs_root(current->fs, &root);
 	error = lock_mount(&old, &old_mp);
 	if (error)
-		goto out3;
+		return error;
 
 	error = -EINVAL;
 	new_mnt = real_mount(new.mnt);
@@ -4705,13 +4707,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	error = 0;
 out4:
 	unlock_mount(&old_mp);
-out3:
-	path_put(&root);
-out2:
-	path_put(&old);
-out1:
-	path_put(&new);
-out0:
 	return error;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 25/63] finish_automount(): take the lock_mount() analogue into a helper
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (22 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 24/63] pivot_root(2): use __free() to deal with struct path in it Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 26/63] do_new_mount_rc(): use __free() to deal with dropping mnt on failure Al Viro
                       ` (37 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

finish_automount() can't use lock_mount() - it treats finding something
already mounted as "quitely drop our mount and return 0", not as
"mount on top of whatever mounted there".  It's been open-coded;
let's take it into a helper similar to lock_mount().  "something's
already mounted" => -EBUSY, finish_automount() needs to distinguish
it from the normal case and it can't happen in other failure cases.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 90b62ee882da..6251ee15f5f6 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3781,9 +3781,29 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 	return err;
 }
 
-int finish_automount(struct vfsmount *m, const struct path *path)
+static int lock_mount_exact(const struct path *path,
+			    struct pinned_mountpoint *mp)
 {
 	struct dentry *dentry = path->dentry;
+	int err;
+
+	inode_lock(dentry->d_inode);
+	namespace_lock();
+	if (unlikely(cant_mount(dentry)))
+		err = -ENOENT;
+	else if (path_overmounted(path))
+		err = -EBUSY;
+	else
+		err = get_mountpoint(dentry, mp);
+	if (unlikely(err)) {
+		namespace_unlock();
+		inode_unlock(dentry->d_inode);
+	}
+	return err;
+}
+
+int finish_automount(struct vfsmount *m, const struct path *path)
+{
 	struct pinned_mountpoint mp = {};
 	struct mount *mnt;
 	int err;
@@ -3805,20 +3825,11 @@ int finish_automount(struct vfsmount *m, const struct path *path)
 	 * that overmounts our mountpoint to be means "quitely drop what we've
 	 * got", not "try to mount it on top".
 	 */
-	inode_lock(dentry->d_inode);
-	namespace_lock();
-	if (unlikely(cant_mount(dentry))) {
-		err = -ENOENT;
-		goto discard_locked;
-	}
-	if (path_overmounted(path)) {
-		err = 0;
-		goto discard_locked;
+	err = lock_mount_exact(path, &mp);
+	if (unlikely(err)) {
+		mntput(m);
+		return err == -EBUSY ? 0 : err;
 	}
-	err = get_mountpoint(dentry, &mp);
-	if (err)
-		goto discard_locked;
-
 	err = do_add_mount(mnt, mp.mp, path,
 			   path->mnt->mnt_flags | MNT_SHRINKABLE);
 	unlock_mount(&mp);
@@ -3826,9 +3837,6 @@ int finish_automount(struct vfsmount *m, const struct path *path)
 		goto discard;
 	return 0;
 
-discard_locked:
-	namespace_unlock();
-	inode_unlock(dentry->d_inode);
 discard:
 	mntput(m);
 	return err;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 26/63] do_new_mount_rc(): use __free() to deal with dropping mnt on failure
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (23 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 25/63] finish_automount(): take the lock_mount() analogue into a helper Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-09-01 11:34       ` Christian Brauner
  2025-08-28 23:07     ` [PATCH v2 27/63] finish_automount(): " Al Viro
                       ` (36 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

do_add_mount() consumes vfsmount on success; just follow it with
conditional retain_and_null_ptr() on success and we can switch
to __free() for mnt and be done with that - unlock_mount() is
in the very end.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 6251ee15f5f6..3551e51461a2 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3696,7 +3696,7 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 {
 	struct pinned_mountpoint mp = {};
 	struct super_block *sb;
-	struct vfsmount *mnt = fc_mount(fc);
+	struct vfsmount *mnt __free(mntput) = fc_mount(fc);
 	int error;
 
 	if (IS_ERR(mnt))
@@ -3704,10 +3704,11 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 
 	sb = fc->root->d_sb;
 	error = security_sb_kern_mount(sb);
-	if (!error && mount_too_revealing(sb, &mnt_flags))
-		error = -EPERM;
 	if (unlikely(error))
-		goto out;
+		return error;
+
+	if (unlikely(mount_too_revealing(sb, &mnt_flags)))
+		return -EPERM;
 
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
@@ -3716,11 +3717,9 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 		error = do_add_mount(real_mount(mnt), mp.mp,
 				     mountpoint, mnt_flags);
 		if (!error)
-			mnt = NULL;	// consumed on success
+			retain_and_null_ptr(mnt); // consumed on success
 		unlock_mount(&mp);
 	}
-out:
-	mntput(mnt);
 	return error;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 27/63] finish_automount(): use __free() to deal with dropping mnt on failure
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (24 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 26/63] do_new_mount_rc(): use __free() to deal with dropping mnt on failure Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 28/63] change calling conventions for lock_mount() et.al Al Viro
                       ` (35 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

same story as with do_new_mount_fc().

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 3551e51461a2..779cfed04291 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3801,8 +3801,9 @@ static int lock_mount_exact(const struct path *path,
 	return err;
 }
 
-int finish_automount(struct vfsmount *m, const struct path *path)
+int finish_automount(struct vfsmount *__m, const struct path *path)
 {
+	struct vfsmount *m __free(mntput) = __m;
 	struct pinned_mountpoint mp = {};
 	struct mount *mnt;
 	int err;
@@ -3814,10 +3815,8 @@ int finish_automount(struct vfsmount *m, const struct path *path)
 
 	mnt = real_mount(m);
 
-	if (m->mnt_root == path->dentry) {
-		err = -ELOOP;
-		goto discard;
-	}
+	if (m->mnt_root == path->dentry)
+		return -ELOOP;
 
 	/*
 	 * we don't want to use lock_mount() - in this case finding something
@@ -3825,19 +3824,14 @@ int finish_automount(struct vfsmount *m, const struct path *path)
 	 * got", not "try to mount it on top".
 	 */
 	err = lock_mount_exact(path, &mp);
-	if (unlikely(err)) {
-		mntput(m);
+	if (unlikely(err))
 		return err == -EBUSY ? 0 : err;
-	}
+
 	err = do_add_mount(mnt, mp.mp, path,
 			   path->mnt->mnt_flags | MNT_SHRINKABLE);
+	if (likely(!err))
+		retain_and_null_ptr(m);
 	unlock_mount(&mp);
-	if (unlikely(err))
-		goto discard;
-	return 0;
-
-discard:
-	mntput(m);
 	return err;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 28/63] change calling conventions for lock_mount() et.al.
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (25 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 27/63] finish_automount(): " Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-09-01 11:37       ` Christian Brauner
  2025-08-28 23:07     ` [PATCH v2 29/63] do_move_mount(): use the parent mount returned by do_lock_mount() Al Viro
                       ` (34 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

1) pinned_mountpoint gets a new member - struct mount *parent.
Set only if we locked the sucker; ERR_PTR() - on failed attempt.

2) do_lock_mount() et.al. return void and set ->parent to
	* on success with !beneath - mount corresponding to path->mnt
	* on success with beneath - the parent of mount corresponding
to path->mnt
	* in case of error - ERR_PTR(-E...).
IOW, we get the mount we will be actually mounting upon or ERR_PTR().

3) we can't use CLASS, since the pinned_mountpoint is placed on
hlist during initialization, so we define local macros:
	LOCK_MOUNT(mp, path)
	LOCK_MOUNT_MAYBE_BENEATH(mp, path, beneath)
	LOCK_MOUNT_EXACT(mp, path)
All of them declare and initialize struct pinned_mountpoint mp,
with unlock_mount done via __cleanup().

Users converted.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 219 ++++++++++++++++++++++++-------------------------
 1 file changed, 108 insertions(+), 111 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 779cfed04291..952e66bdb9bb 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -919,6 +919,7 @@ bool __is_local_mountpoint(const struct dentry *dentry)
 struct pinned_mountpoint {
 	struct hlist_node node;
 	struct mountpoint *mp;
+	struct mount *parent;
 };
 
 static bool lookup_mountpoint(struct dentry *dentry, struct pinned_mountpoint *m)
@@ -2728,48 +2729,47 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 }
 
 /**
- * do_lock_mount - lock mount and mountpoint
- * @path:    target path
- * @beneath: whether the intention is to mount beneath @path
+ * do_lock_mount - acquire environment for mounting
+ * @path:	target path
+ * @res:	context to set up
+ * @beneath:	whether the intention is to mount beneath @path
  *
- * Follow the mount stack on @path until the top mount @mnt is found. If
- * the initial @path->{mnt,dentry} is a mountpoint lookup the first
- * mount stacked on top of it. Then simply follow @{mnt,mnt->mnt_root}
- * until nothing is stacked on top of it anymore.
+ * To mount something at given location, we need
+ *	namespace_sem locked exclusive
+ *	inode of dentry we are mounting on locked exclusive
+ *	struct mountpoint for that dentry
+ *	struct mount we are mounting on
  *
- * Acquire the inode_lock() on the top mount's ->mnt_root to protect
- * against concurrent removal of the new mountpoint from another mount
- * namespace.
+ * Results are stored in caller-supplied context (pinned_mountpoint);
+ * on success we have res->parent and res->mp pointing to parent and
+ * mountpoint respectively and res->node inserted into the ->m_list
+ * of the mountpoint, making sure the mountpoint won't disappear.
+ * On failure we have res->parent set to ERR_PTR(-E...), res->mp
+ * left NULL, res->node - empty.
+ * In case of success do_lock_mount returns with locks acquired (in
+ * proper order - inode lock nests outside of namespace_sem).
  *
- * If @beneath is requested, acquire inode_lock() on @mnt's mountpoint
- * @mp on @mnt->mnt_parent must be acquired. This protects against a
- * concurrent unlink of @mp->mnt_dentry from another mount namespace
- * where @mnt doesn't have a child mount mounted @mp. A concurrent
- * removal of @mnt->mnt_root doesn't matter as nothing will be mounted
- * on top of it for @beneath.
+ * Request to mount on overmounted location is treated as "mount on
+ * top of whatever's overmounting it"; request to mount beneath
+ * a location - "mount immediately beneath the topmost mount at that
+ * place".
  *
- * In addition, @beneath needs to make sure that @mnt hasn't been
- * unmounted or moved from its current mountpoint in between dropping
- * @mount_lock and acquiring @namespace_sem. For the !@beneath case @mnt
- * being unmounted would be detected later by e.g., calling
- * check_mnt(mnt) in the function it's called from. For the @beneath
- * case however, it's useful to detect it directly in do_lock_mount().
- * If @mnt hasn't been unmounted then @mnt->mnt_mountpoint still points
- * to @mnt->mnt_mp->m_dentry. But if @mnt has been unmounted it will
- * point to @mnt->mnt_root and @mnt->mnt_mp will be NULL.
- *
- * Return: Either the target mountpoint on the top mount or the top
- *         mount's mountpoint.
+ * In all cases the location must not have been unmounted and the
+ * chosen mountpoint must be allowed to be mounted on.  For "beneath"
+ * case we also require the location to be at the root of a mount
+ * that has a parent (i.e. is not a root of some namespace).
  */
-static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bool beneath)
+static void do_lock_mount(struct path *path, struct pinned_mountpoint *res, bool beneath)
 {
 	struct vfsmount *mnt = path->mnt;
 	struct dentry *dentry;
 	struct path under = {};
 	int err = -ENOENT;
 
-	if (unlikely(beneath) && !path_mounted(path))
-		return -EINVAL;
+	if (unlikely(beneath) && !path_mounted(path)) {
+		res->parent = ERR_PTR(-EINVAL);
+		return;
+	}
 
 	for (;;) {
 		struct mount *m = real_mount(mnt);
@@ -2779,7 +2779,8 @@ static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bo
 			read_seqlock_excl(&mount_lock);
 			if (unlikely(!mnt_has_parent(m))) {
 				read_sequnlock_excl(&mount_lock);
-				return -EINVAL;
+				res->parent = ERR_PTR(-EINVAL);
+				return;
 			}
 			under.mnt = mntget(&m->mnt_parent->mnt);
 			under.dentry = dget(m->mnt_mountpoint);
@@ -2811,7 +2812,7 @@ static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bo
 			path->dentry = dget(mnt->mnt_root);
 			continue;	// got overmounted
 		}
-		err = get_mountpoint(dentry, pinned);
+		err = get_mountpoint(dentry, res);
 		if (err)
 			break;
 		if (beneath) {
@@ -2822,22 +2823,25 @@ static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bo
 			 * we are not dropping the final references here).
 			 */
 			path_put(&under);
+			res->parent = real_mount(path->mnt)->mnt_parent;
+			return;
 		}
-		return 0;
+		res->parent = real_mount(path->mnt);
+		return;
 	}
 	namespace_unlock();
 	inode_unlock(dentry->d_inode);
 	if (beneath)
 		path_put(&under);
-	return err;
+	res->parent = ERR_PTR(err);
 }
 
-static inline int lock_mount(struct path *path, struct pinned_mountpoint *m)
+static inline void lock_mount(struct path *path, struct pinned_mountpoint *m)
 {
-	return do_lock_mount(path, m, false);
+	do_lock_mount(path, m, false);
 }
 
-static void unlock_mount(struct pinned_mountpoint *m)
+static void __unlock_mount(struct pinned_mountpoint *m)
 {
 	inode_unlock(m->mp->m_dentry->d_inode);
 	read_seqlock_excl(&mount_lock);
@@ -2846,6 +2850,20 @@ static void unlock_mount(struct pinned_mountpoint *m)
 	namespace_unlock();
 }
 
+static inline void unlock_mount(struct pinned_mountpoint *m)
+{
+	if (!IS_ERR(m->parent))
+		__unlock_mount(m);
+}
+
+#define LOCK_MOUNT_MAYBE_BENEATH(mp, path, beneath) \
+	struct pinned_mountpoint mp __cleanup(unlock_mount) = {}; \
+	do_lock_mount((path), &mp, (beneath))
+#define LOCK_MOUNT(mp, path) LOCK_MOUNT_MAYBE_BENEATH(mp, (path), false)
+#define LOCK_MOUNT_EXACT(mp, path) \
+	struct pinned_mountpoint mp __cleanup(unlock_mount) = {}; \
+	lock_mount_exact((path), &mp)
+
 static int graft_tree(struct mount *mnt, struct mount *p, struct mountpoint *mp)
 {
 	if (mnt->mnt.mnt_sb->s_flags & SB_NOUSER)
@@ -3015,8 +3033,7 @@ static int do_loopback(struct path *path, const char *old_name,
 				int recurse)
 {
 	struct path old_path __free(path_put) = {};
-	struct mount *mnt = NULL, *parent;
-	struct pinned_mountpoint mp = {};
+	struct mount *mnt = NULL;
 	int err;
 	if (!old_name || !*old_name)
 		return -EINVAL;
@@ -3027,28 +3044,23 @@ static int do_loopback(struct path *path, const char *old_name,
 	if (mnt_ns_loop(old_path.dentry))
 		return -EINVAL;
 
-	err = lock_mount(path, &mp);
-	if (err)
-		return err;
+	LOCK_MOUNT(mp, path);
+	if (IS_ERR(mp.parent))
+		return PTR_ERR(mp.parent);
 
-	parent = real_mount(path->mnt);
-	if (!check_mnt(parent))
-		goto out2;
+	if (!check_mnt(mp.parent))
+		return -EINVAL;
 
 	mnt = __do_loopback(&old_path, recurse);
-	if (IS_ERR(mnt)) {
-		err = PTR_ERR(mnt);
-		goto out2;
-	}
+	if (IS_ERR(mnt))
+		return PTR_ERR(mnt);
 
-	err = graft_tree(mnt, parent, mp.mp);
+	err = graft_tree(mnt, mp.parent, mp.mp);
 	if (err) {
 		lock_mount_hash();
 		umount_tree(mnt, UMOUNT_SYNC);
 		unlock_mount_hash();
 	}
-out2:
-	unlock_mount(&mp);
 	return err;
 }
 
@@ -3561,7 +3573,6 @@ static int do_move_mount(struct path *old_path,
 {
 	struct mount *p;
 	struct mount *old = real_mount(old_path->mnt);
-	struct pinned_mountpoint mp;
 	int err;
 	bool beneath = flags & MNT_TREE_BENEATH;
 
@@ -3571,52 +3582,49 @@ static int do_move_mount(struct path *old_path,
 	if (d_is_dir(new_path->dentry) != d_is_dir(old_path->dentry))
 		return -EINVAL;
 
-	err = do_lock_mount(new_path, &mp, beneath);
-	if (err)
-		return err;
+	LOCK_MOUNT_MAYBE_BENEATH(mp, new_path, beneath);
+	if (IS_ERR(mp.parent))
+		return PTR_ERR(mp.parent);
 
 	p = real_mount(new_path->mnt);
 
-	err = -EINVAL;
-
 	if (check_mnt(old)) {
 		/* if the source is in our namespace... */
 		/* ... it should be detachable from parent */
 		if (!mnt_has_parent(old) || IS_MNT_LOCKED(old))
-			goto out;
+			return -EINVAL;
 		/* ... which should not be shared */
 		if (IS_MNT_SHARED(old->mnt_parent))
-			goto out;
+			return -EINVAL;
 		/* ... and the target should be in our namespace */
 		if (!check_mnt(p))
-			goto out;
+			return -EINVAL;
 	} else {
 		/*
 		 * otherwise the source must be the root of some anon namespace.
 		 */
 		if (!anon_ns_root(old))
-			goto out;
+			return -EINVAL;
 		/*
 		 * Bail out early if the target is within the same namespace -
 		 * subsequent checks would've rejected that, but they lose
 		 * some corner cases if we check it early.
 		 */
 		if (old->mnt_ns == p->mnt_ns)
-			goto out;
+			return -EINVAL;
 		/*
 		 * Target should be either in our namespace or in an acceptable
 		 * anon namespace, sensu check_anonymous_mnt().
 		 */
 		if (!may_use_mount(p))
-			goto out;
+			return -EINVAL;
 	}
 
 	if (beneath) {
 		err = can_move_mount_beneath(old, new_path, mp.mp);
 		if (err)
-			goto out;
+			return err;
 
-		err = -EINVAL;
 		p = p->mnt_parent;
 	}
 
@@ -3625,17 +3633,13 @@ static int do_move_mount(struct path *old_path,
 	 * mount which is shared.
 	 */
 	if (IS_MNT_SHARED(p) && tree_contains_unbindable(old))
-		goto out;
-	err = -ELOOP;
+		return -EINVAL;
 	if (!check_for_nsfs_mounts(old))
-		goto out;
+		return -ELOOP;
 	if (mount_is_ancestor(old, p))
-		goto out;
+		return -ELOOP;
 
-	err = attach_recursive_mnt(old, p, mp.mp);
-out:
-	unlock_mount(&mp);
-	return err;
+	return attach_recursive_mnt(old, p, mp.mp);
 }
 
 static int do_move_mount_old(struct path *path, const char *old_name)
@@ -3694,7 +3698,6 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
 static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 			   unsigned int mnt_flags)
 {
-	struct pinned_mountpoint mp = {};
 	struct super_block *sb;
 	struct vfsmount *mnt __free(mntput) = fc_mount(fc);
 	int error;
@@ -3712,13 +3715,14 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
-	error = lock_mount(mountpoint, &mp);
-	if (!error) {
+	LOCK_MOUNT(mp, mountpoint);
+	if (IS_ERR(mp.parent)) {
+		return PTR_ERR(mp.parent);
+	} else {
 		error = do_add_mount(real_mount(mnt), mp.mp,
 				     mountpoint, mnt_flags);
 		if (!error)
 			retain_and_null_ptr(mnt); // consumed on success
-		unlock_mount(&mp);
 	}
 	return error;
 }
@@ -3780,8 +3784,8 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 	return err;
 }
 
-static int lock_mount_exact(const struct path *path,
-			    struct pinned_mountpoint *mp)
+static void lock_mount_exact(const struct path *path,
+			     struct pinned_mountpoint *mp)
 {
 	struct dentry *dentry = path->dentry;
 	int err;
@@ -3797,14 +3801,15 @@ static int lock_mount_exact(const struct path *path,
 	if (unlikely(err)) {
 		namespace_unlock();
 		inode_unlock(dentry->d_inode);
+		mp->parent = ERR_PTR(err);
+	} else {
+		mp->parent = real_mount(path->mnt);
 	}
-	return err;
 }
 
 int finish_automount(struct vfsmount *__m, const struct path *path)
 {
 	struct vfsmount *m __free(mntput) = __m;
-	struct pinned_mountpoint mp = {};
 	struct mount *mnt;
 	int err;
 
@@ -3823,15 +3828,14 @@ int finish_automount(struct vfsmount *__m, const struct path *path)
 	 * that overmounts our mountpoint to be means "quitely drop what we've
 	 * got", not "try to mount it on top".
 	 */
-	err = lock_mount_exact(path, &mp);
-	if (unlikely(err))
-		return err == -EBUSY ? 0 : err;
+	LOCK_MOUNT_EXACT(mp, path);
+	if (IS_ERR(mp.parent))
+		return mp.parent == ERR_PTR(-EBUSY) ? 0 : PTR_ERR(mp.parent);
 
 	err = do_add_mount(mnt, mp.mp, path,
 			   path->mnt->mnt_flags | MNT_SHRINKABLE);
 	if (likely(!err))
 		retain_and_null_ptr(m);
-	unlock_mount(&mp);
 	return err;
 }
 
@@ -4627,7 +4631,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	struct path old __free(path_put) = {};
 	struct path root __free(path_put) = {};
 	struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent, *ex_parent;
-	struct pinned_mountpoint old_mp = {};
 	int error;
 
 	if (!may_mount())
@@ -4648,45 +4651,42 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 		return error;
 
 	get_fs_root(current->fs, &root);
-	error = lock_mount(&old, &old_mp);
-	if (error)
-		return error;
 
-	error = -EINVAL;
+	LOCK_MOUNT(old_mp, &old);
+	old_mnt = old_mp.parent;
+	if (IS_ERR(old_mnt))
+		return PTR_ERR(old_mnt);
+
 	new_mnt = real_mount(new.mnt);
 	root_mnt = real_mount(root.mnt);
-	old_mnt = real_mount(old.mnt);
 	ex_parent = new_mnt->mnt_parent;
 	root_parent = root_mnt->mnt_parent;
 	if (IS_MNT_SHARED(old_mnt) ||
 		IS_MNT_SHARED(ex_parent) ||
 		IS_MNT_SHARED(root_parent))
-		goto out4;
+		return -EINVAL;
 	if (!check_mnt(root_mnt) || !check_mnt(new_mnt))
-		goto out4;
+		return -EINVAL;
 	if (new_mnt->mnt.mnt_flags & MNT_LOCKED)
-		goto out4;
-	error = -ENOENT;
+		return -EINVAL;
 	if (d_unlinked(new.dentry))
-		goto out4;
-	error = -EBUSY;
+		return -ENOENT;
 	if (new_mnt == root_mnt || old_mnt == root_mnt)
-		goto out4; /* loop, on the same file system  */
-	error = -EINVAL;
+		return -EBUSY; /* loop, on the same file system  */
 	if (!path_mounted(&root))
-		goto out4; /* not a mountpoint */
+		return -EINVAL; /* not a mountpoint */
 	if (!mnt_has_parent(root_mnt))
-		goto out4; /* absolute root */
+		return -EINVAL; /* absolute root */
 	if (!path_mounted(&new))
-		goto out4; /* not a mountpoint */
+		return -EINVAL; /* not a mountpoint */
 	if (!mnt_has_parent(new_mnt))
-		goto out4; /* absolute root */
+		return -EINVAL; /* absolute root */
 	/* make sure we can reach put_old from new_root */
 	if (!is_path_reachable(old_mnt, old.dentry, &new))
-		goto out4;
+		return -EINVAL;
 	/* make certain new is below the root */
 	if (!is_path_reachable(new_mnt, new.dentry, &root))
-		goto out4;
+		return -EINVAL;
 	lock_mount_hash();
 	umount_mnt(new_mnt);
 	if (root_mnt->mnt.mnt_flags & MNT_LOCKED) {
@@ -4705,10 +4705,7 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	mnt_notify_add(root_mnt);
 	mnt_notify_add(new_mnt);
 	chroot_fs_refs(&root, &new);
-	error = 0;
-out4:
-	unlock_mount(&old_mp);
-	return error;
+	return 0;
 }
 
 static unsigned int recalc_flags(struct mount_kattr *kattr, struct mount *mnt)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 29/63] do_move_mount(): use the parent mount returned by do_lock_mount()
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (26 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 28/63] change calling conventions for lock_mount() et.al Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-09-01 11:38       ` Christian Brauner
  2025-08-28 23:07     ` [PATCH v2 30/63] do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path Al Viro
                       ` (33 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

After successful do_lock_mount() call, mp.parent is set to either
real_mount(path->mnt) (for !beneath case) or to ->mnt_parent of that
(for beneath).  p is set to real_mount(path->mnt) and after
several uses it's made equal to mp.parent.  All uses prior to that
care only about p->mnt_ns and since p->mnt_ns == parent->mnt_ns,
we might as well use mp.parent all along.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 952e66bdb9bb..d57e727962da 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3571,7 +3571,6 @@ static inline bool may_use_mount(struct mount *mnt)
 static int do_move_mount(struct path *old_path,
 			 struct path *new_path, enum mnt_tree_flags_t flags)
 {
-	struct mount *p;
 	struct mount *old = real_mount(old_path->mnt);
 	int err;
 	bool beneath = flags & MNT_TREE_BENEATH;
@@ -3586,8 +3585,6 @@ static int do_move_mount(struct path *old_path,
 	if (IS_ERR(mp.parent))
 		return PTR_ERR(mp.parent);
 
-	p = real_mount(new_path->mnt);
-
 	if (check_mnt(old)) {
 		/* if the source is in our namespace... */
 		/* ... it should be detachable from parent */
@@ -3597,7 +3594,7 @@ static int do_move_mount(struct path *old_path,
 		if (IS_MNT_SHARED(old->mnt_parent))
 			return -EINVAL;
 		/* ... and the target should be in our namespace */
-		if (!check_mnt(p))
+		if (!check_mnt(mp.parent))
 			return -EINVAL;
 	} else {
 		/*
@@ -3610,13 +3607,13 @@ static int do_move_mount(struct path *old_path,
 		 * subsequent checks would've rejected that, but they lose
 		 * some corner cases if we check it early.
 		 */
-		if (old->mnt_ns == p->mnt_ns)
+		if (old->mnt_ns == mp.parent->mnt_ns)
 			return -EINVAL;
 		/*
 		 * Target should be either in our namespace or in an acceptable
 		 * anon namespace, sensu check_anonymous_mnt().
 		 */
-		if (!may_use_mount(p))
+		if (!may_use_mount(mp.parent))
 			return -EINVAL;
 	}
 
@@ -3624,22 +3621,20 @@ static int do_move_mount(struct path *old_path,
 		err = can_move_mount_beneath(old, new_path, mp.mp);
 		if (err)
 			return err;
-
-		p = p->mnt_parent;
 	}
 
 	/*
 	 * Don't move a mount tree containing unbindable mounts to a destination
 	 * mount which is shared.
 	 */
-	if (IS_MNT_SHARED(p) && tree_contains_unbindable(old))
+	if (IS_MNT_SHARED(mp.parent) && tree_contains_unbindable(old))
 		return -EINVAL;
 	if (!check_for_nsfs_mounts(old))
 		return -ELOOP;
-	if (mount_is_ancestor(old, p))
+	if (mount_is_ancestor(old, mp.parent))
 		return -ELOOP;
 
-	return attach_recursive_mnt(old, p, mp.mp);
+	return attach_recursive_mnt(old, mp.parent, mp.mp);
 }
 
 static int do_move_mount_old(struct path *path, const char *old_name)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 30/63] do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (27 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 29/63] do_move_mount(): use the parent mount returned by do_lock_mount() Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-09-01 11:40       ` Christian Brauner
  2025-08-28 23:07     ` [PATCH v2 31/63] graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint Al Viro
                       ` (32 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Both callers pass it a mountpoint reference picked from pinned_mountpoint
and path it corresponds to.

First of all, path->dentry is equal to mp.mp->m_dentry.  Furthermore, path->mnt
is &mp.parent->mnt, making struct path contents redundant.

Pass it the address of that pinned_mountpoint instead; what's more, if we
teach it to treat ERR_PTR(error) in ->parent as "bail out with that error"
we can simplify the callers even more - do_add_mount() will do the right
thing even when called after lock_mount() failure.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index d57e727962da..b236536bbbc9 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3657,10 +3657,13 @@ static int do_move_mount_old(struct path *path, const char *old_name)
 /*
  * add a mount into a namespace's mount tree
  */
-static int do_add_mount(struct mount *newmnt, struct mountpoint *mp,
-			const struct path *path, int mnt_flags)
+static int do_add_mount(struct mount *newmnt, const struct pinned_mountpoint *mp,
+			int mnt_flags)
 {
-	struct mount *parent = real_mount(path->mnt);
+	struct mount *parent = mp->parent;
+
+	if (IS_ERR(parent))
+		return PTR_ERR(parent);
 
 	mnt_flags &= ~MNT_INTERNAL_FLAGS;
 
@@ -3674,14 +3677,15 @@ static int do_add_mount(struct mount *newmnt, struct mountpoint *mp,
 	}
 
 	/* Refuse the same filesystem on the same mount point */
-	if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb && path_mounted(path))
+	if (parent->mnt.mnt_sb == newmnt->mnt.mnt_sb &&
+	    parent->mnt.mnt_root == mp->mp->m_dentry)
 		return -EBUSY;
 
 	if (d_is_symlink(newmnt->mnt.mnt_root))
 		return -EINVAL;
 
 	newmnt->mnt.mnt_flags = mnt_flags;
-	return graft_tree(newmnt, parent, mp);
+	return graft_tree(newmnt, parent, mp->mp);
 }
 
 static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags);
@@ -3711,14 +3715,9 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
 	LOCK_MOUNT(mp, mountpoint);
-	if (IS_ERR(mp.parent)) {
-		return PTR_ERR(mp.parent);
-	} else {
-		error = do_add_mount(real_mount(mnt), mp.mp,
-				     mountpoint, mnt_flags);
-		if (!error)
-			retain_and_null_ptr(mnt); // consumed on success
-	}
+	error = do_add_mount(real_mount(mnt), &mp, mnt_flags);
+	if (!error)
+		retain_and_null_ptr(mnt); // consumed on success
 	return error;
 }
 
@@ -3824,11 +3823,10 @@ int finish_automount(struct vfsmount *__m, const struct path *path)
 	 * got", not "try to mount it on top".
 	 */
 	LOCK_MOUNT_EXACT(mp, path);
-	if (IS_ERR(mp.parent))
-		return mp.parent == ERR_PTR(-EBUSY) ? 0 : PTR_ERR(mp.parent);
+	if (mp.parent == ERR_PTR(-EBUSY))
+		return 0;
 
-	err = do_add_mount(mnt, mp.mp, path,
-			   path->mnt->mnt_flags | MNT_SHRINKABLE);
+	err = do_add_mount(mnt, &mp, path->mnt->mnt_flags | MNT_SHRINKABLE);
 	if (likely(!err))
 		retain_and_null_ptr(m);
 	return err;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 31/63] graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (28 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 30/63] do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-09-01 11:41       ` Christian Brauner
  2025-08-28 23:07     ` [PATCH v2 32/63] pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry Al Viro
                       ` (31 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

parent and mountpoint always come from the same struct pinned_mountpoint
now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index b236536bbbc9..18d6ad0f4f76 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2549,8 +2549,7 @@ enum mnt_tree_flags_t {
 /**
  * attach_recursive_mnt - attach a source mount tree
  * @source_mnt: mount tree to be attached
- * @dest_mnt:   mount that @source_mnt will be mounted on
- * @dest_mp:    the mountpoint @source_mnt will be mounted at
+ * @dest:	the context for mounting at the place where the tree should go
  *
  *  NOTE: in the table below explains the semantics when a source mount
  *  of a given type is attached to a destination mount of a given type.
@@ -2613,10 +2612,11 @@ enum mnt_tree_flags_t {
  *         Otherwise a negative error code is returned.
  */
 static int attach_recursive_mnt(struct mount *source_mnt,
-				struct mount *dest_mnt,
-				struct mountpoint *dest_mp)
+				const struct pinned_mountpoint *dest)
 {
 	struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns;
+	struct mount *dest_mnt = dest->parent;
+	struct mountpoint *dest_mp = dest->mp;
 	HLIST_HEAD(tree_list);
 	struct mnt_namespace *ns = dest_mnt->mnt_ns;
 	struct pinned_mountpoint root = {};
@@ -2864,16 +2864,16 @@ static inline void unlock_mount(struct pinned_mountpoint *m)
 	struct pinned_mountpoint mp __cleanup(unlock_mount) = {}; \
 	lock_mount_exact((path), &mp)
 
-static int graft_tree(struct mount *mnt, struct mount *p, struct mountpoint *mp)
+static int graft_tree(struct mount *mnt, const struct pinned_mountpoint *mp)
 {
 	if (mnt->mnt.mnt_sb->s_flags & SB_NOUSER)
 		return -EINVAL;
 
-	if (d_is_dir(mp->m_dentry) !=
+	if (d_is_dir(mp->mp->m_dentry) !=
 	      d_is_dir(mnt->mnt.mnt_root))
 		return -ENOTDIR;
 
-	return attach_recursive_mnt(mnt, p, mp);
+	return attach_recursive_mnt(mnt, mp);
 }
 
 static int may_change_propagation(const struct mount *m)
@@ -3055,7 +3055,7 @@ static int do_loopback(struct path *path, const char *old_name,
 	if (IS_ERR(mnt))
 		return PTR_ERR(mnt);
 
-	err = graft_tree(mnt, mp.parent, mp.mp);
+	err = graft_tree(mnt, &mp);
 	if (err) {
 		lock_mount_hash();
 		umount_tree(mnt, UMOUNT_SYNC);
@@ -3634,7 +3634,7 @@ static int do_move_mount(struct path *old_path,
 	if (mount_is_ancestor(old, mp.parent))
 		return -ELOOP;
 
-	return attach_recursive_mnt(old, mp.parent, mp.mp);
+	return attach_recursive_mnt(old, &mp);
 }
 
 static int do_move_mount_old(struct path *path, const char *old_name)
@@ -3685,7 +3685,7 @@ static int do_add_mount(struct mount *newmnt, const struct pinned_mountpoint *mp
 		return -EINVAL;
 
 	newmnt->mnt.mnt_flags = mnt_flags;
-	return graft_tree(newmnt, parent, mp->mp);
+	return graft_tree(newmnt, mp);
 }
 
 static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 32/63] pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (29 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 31/63] graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 33/63] don't bother passing new_path->dentry to can_move_mount_beneath() Al Viro
                       ` (30 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

That kills the last place where callers of lock_mount(path, &mp)
used path->dentry.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 18d6ad0f4f76..02bc5294071a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4675,7 +4675,7 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	if (!mnt_has_parent(new_mnt))
 		return -EINVAL; /* absolute root */
 	/* make sure we can reach put_old from new_root */
-	if (!is_path_reachable(old_mnt, old.dentry, &new))
+	if (!is_path_reachable(old_mnt, old_mp.mp->m_dentry, &new))
 		return -EINVAL;
 	/* make certain new is below the root */
 	if (!is_path_reachable(new_mnt, new.dentry, &root))
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 33/63] don't bother passing new_path->dentry to can_move_mount_beneath()
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (30 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 32/63] pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:20       ` Linus Torvalds
  2025-08-28 23:07     ` [PATCH v2 34/63] new helper: topmost_overmount() Al Viro
                       ` (29 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 02bc5294071a..085877bfaa5e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3450,8 +3450,8 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
 /**
  * can_move_mount_beneath - check that we can mount beneath the top mount
  * @mnt_from: mount we are trying to move
- * @to:   mount under which to mount
- * @mp:   mountpoint of @to
+ * @mnt_to:   mount under which to mount
+ * @mp:   mountpoint of @mnt_to
  *
  * - Make sure that nothing can be mounted beneath the caller's current
  *   root or the rootfs of the namespace.
@@ -3467,11 +3467,10 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  * Return: On success 0, and on error a negative error code is returned.
  */
 static int can_move_mount_beneath(struct mount *mnt_from,
-				  const struct path *to,
+				  struct mount *mnt_to,
 				  const struct mountpoint *mp)
 {
-	struct mount *mnt_to = real_mount(to->mnt),
-		     *parent_mnt_to = mnt_to->mnt_parent;
+	struct mount *parent_mnt_to = mnt_to->mnt_parent;
 
 	if (IS_MNT_LOCKED(mnt_to))
 		return -EINVAL;
@@ -3618,7 +3617,7 @@ static int do_move_mount(struct path *old_path,
 	}
 
 	if (beneath) {
-		err = can_move_mount_beneath(old, new_path, mp.mp);
+		err = can_move_mount_beneath(old, real_mount(new_path->mnt), mp.mp);
 		if (err)
 			return err;
 	}
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 34/63] new helper: topmost_overmount()
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (31 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 33/63] don't bother passing new_path->dentry to can_move_mount_beneath() Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 35/63] do_lock_mount(): don't modify path Al Viro
                       ` (28 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Returns the final (topmost) mount in the chain of overmounts
starting at given mount.  Same locking rules as for any mount
tree traversal - either the spinlock side of mount_lock, or
rcu + sample the seqcount side of mount_lock before the call
and recheck afterwards.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/mount.h     | 7 +++++++
 fs/namespace.c | 9 +++------
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index ed8c83ba836a..04d0eadc4c10 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -235,4 +235,11 @@ static inline void mnt_notify_add(struct mount *m)
 }
 #endif
 
+static inline struct mount *topmost_overmount(struct mount *m)
+{
+	while (m->overmount)
+		m = m->overmount;
+	return m;
+}
+
 struct mnt_namespace *mnt_ns_from_dentry(struct dentry *dentry);
diff --git a/fs/namespace.c b/fs/namespace.c
index 085877bfaa5e..ebecb03972c5 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2696,10 +2696,9 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 				 child->mnt_mountpoint);
 		commit_tree(child);
 		if (q) {
+			struct mount *r = topmost_overmount(child);
 			struct mountpoint *mp = root.mp;
-			struct mount *r = child;
-			while (unlikely(r->overmount))
-				r = r->overmount;
+
 			if (unlikely(shorter) && child != source_mnt)
 				mp = shorter;
 			mnt_change_mountpoint(r, mp, q);
@@ -6171,9 +6170,7 @@ bool current_chrooted(void)
 
 	guard(mount_locked_reader)();
 
-	root = current->nsproxy->mnt_ns->root;
-	while (unlikely(root->overmount))
-		root = root->overmount;
+	root = topmost_overmount(current->nsproxy->mnt_ns->root);
 
 	return fs_root.mnt != &root->mnt || !path_mounted(&fs_root);
 }
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 35/63] do_lock_mount(): don't modify path.
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (32 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 34/63] new helper: topmost_overmount() Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-09-02 10:55       ` Christian Brauner
  2025-08-28 23:07     ` [PATCH v2 36/63] constify check_mnt() Al Viro
                       ` (27 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Currently do_lock_mount() has the target path switched to whatever
might be overmounting it.  We _do_ want to have the parent
mount/mountpoint chosen on top of the overmounting pile; however,
the way it's done has unpleasant races - if umount propagation
removes the overmount while we'd been trying to set the environment
up, we might end up failing if our target path strays into that overmount
just before the overmount gets kicked out.

Users of do_lock_mount() do not need the target path changed - they
have all information in res->{parent,mp}; only one place (in
do_move_mount()) currently uses the resulting path->mnt, and that value
is trivial to reconstruct by the original value of path->mnt + chosen
parent mount.

Let's keep the target path unchanged; it avoids a bunch of subtle races
and it's not hard to do:
	do
		as mount_locked_reader
			find the prospective parent mount/mountpoint dentry
			grab references if it's not the original target
		lock the prospective mountpoint dentry
		take namespace_sem exclusive
		if prospective parent/mountpoint would be different now
			err = -EAGAIN
		else if location has been unmounted
			err = -ENOENT
		else if mountpoint dentry is not allowed to be mounted on
			err = -ENOENT
		else if beneath and the top of the pile was the absolute root
			err = -EINVAL
		else
			try to get struct mountpoint (by dentry), set
			err to 0 on success and -ENO{MEM,ENT} on failure
		if err != 0
			res->parent = ERR_PTR(err)
			drop locks
		else
			res->parent = prospective parent
		drop temporary references
	while err == -EAGAIN

A somewhat subtle part is that dropping temporary references is allowed.
Neither mounts nor dentries should be evicted by a thread that holds
namespace_sem.  On success we are dropping those references under
namespace_sem, so we need to be sure that these are not the last
references remaining.  However, on success we'd already verified (under
namespace_sem) that original target is still mounted and that mount
and dentry we are about to drop are still reachable from it via the
mount tree.  That guarantees that we are not about to drop the last
remaining references.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 126 ++++++++++++++++++++++++++-----------------------
 1 file changed, 68 insertions(+), 58 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index ebecb03972c5..b77d2df606a1 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2727,6 +2727,27 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 	return err;
 }
 
+static inline struct mount *where_to_mount(const struct path *path,
+					   struct dentry **dentry,
+					   bool beneath)
+{
+	struct mount *m;
+
+	if (unlikely(beneath)) {
+		m = topmost_overmount(real_mount(path->mnt));
+		*dentry = m->mnt_mountpoint;
+		return m->mnt_parent;
+	} else {
+		m = __lookup_mnt(path->mnt, *dentry = path->dentry);
+		if (unlikely(m)) {
+			m = topmost_overmount(m);
+			*dentry = m->mnt.mnt_root;
+			return m;
+		}
+		return real_mount(path->mnt);
+	}
+}
+
 /**
  * do_lock_mount - acquire environment for mounting
  * @path:	target path
@@ -2758,84 +2779,69 @@ static int attach_recursive_mnt(struct mount *source_mnt,
  * case we also require the location to be at the root of a mount
  * that has a parent (i.e. is not a root of some namespace).
  */
-static void do_lock_mount(struct path *path, struct pinned_mountpoint *res, bool beneath)
+static void do_lock_mount(const struct path *path,
+			  struct pinned_mountpoint *res,
+			  bool beneath)
 {
-	struct vfsmount *mnt = path->mnt;
-	struct dentry *dentry;
-	struct path under = {};
-	int err = -ENOENT;
+	int err;
 
 	if (unlikely(beneath) && !path_mounted(path)) {
 		res->parent = ERR_PTR(-EINVAL);
 		return;
 	}
 
-	for (;;) {
-		struct mount *m = real_mount(mnt);
-
-		if (beneath) {
-			path_put(&under);
-			read_seqlock_excl(&mount_lock);
-			if (unlikely(!mnt_has_parent(m))) {
-				read_sequnlock_excl(&mount_lock);
-				res->parent = ERR_PTR(-EINVAL);
-				return;
+	do {
+		struct dentry *dentry, *d;
+		struct mount *m, *n;
+
+		scoped_guard(mount_locked_reader) {
+			m = where_to_mount(path, &dentry, beneath);
+			if (&m->mnt != path->mnt) {
+				mntget(&m->mnt);
+				dget(dentry);
 			}
-			under.mnt = mntget(&m->mnt_parent->mnt);
-			under.dentry = dget(m->mnt_mountpoint);
-			read_sequnlock_excl(&mount_lock);
-			dentry = under.dentry;
-		} else {
-			dentry = path->dentry;
 		}
 
 		inode_lock(dentry->d_inode);
 		namespace_lock();
 
-		if (unlikely(cant_mount(dentry) || !is_mounted(mnt)))
-			break;		// not to be mounted on
+		// check if the chain of mounts (if any) has changed.
+		scoped_guard(mount_locked_reader)
+			n = where_to_mount(path, &d, beneath);
 
-		if (beneath && unlikely(m->mnt_mountpoint != dentry ||
-				        &m->mnt_parent->mnt != under.mnt)) {
-			namespace_unlock();
-			inode_unlock(dentry->d_inode);
-			continue;	// got moved
-		}
+		if (unlikely(n != m || dentry != d))
+			err = -EAGAIN;		// something moved, retry
+		else if (unlikely(cant_mount(dentry) || !is_mounted(path->mnt)))
+			err = -ENOENT;		// not to be mounted on
+		else if (beneath && &m->mnt == path->mnt && !m->overmount)
+			err = -EINVAL;
+		else
+			err = get_mountpoint(dentry, res);
 
-		mnt = lookup_mnt(path);
-		if (unlikely(mnt)) {
+		if (unlikely(err)) {
+			res->parent = ERR_PTR(err);
 			namespace_unlock();
 			inode_unlock(dentry->d_inode);
-			path_put(path);
-			path->mnt = mnt;
-			path->dentry = dget(mnt->mnt_root);
-			continue;	// got overmounted
+		} else {
+			res->parent = m;
 		}
-		err = get_mountpoint(dentry, res);
-		if (err)
-			break;
-		if (beneath) {
-			/*
-			 * @under duplicates the references that will stay
-			 * at least until namespace_unlock(), so the path_put()
-			 * below is safe (and OK to do under namespace_lock -
-			 * we are not dropping the final references here).
-			 */
-			path_put(&under);
-			res->parent = real_mount(path->mnt)->mnt_parent;
-			return;
+		/*
+		 * Drop the temporary references.  This is subtle - on success
+		 * we are doing that under namespace_sem, which would normally
+		 * be forbidden.  However, in that case we are guaranteed that
+		 * refcounts won't reach zero, since we know that path->mnt
+		 * is mounted and thus all mounts reachable from it are pinned
+		 * and stable, along with their mountpoints and roots.
+		 */
+		if (&m->mnt != path->mnt) {
+			dput(dentry);
+			mntput(&m->mnt);
 		}
-		res->parent = real_mount(path->mnt);
-		return;
-	}
-	namespace_unlock();
-	inode_unlock(dentry->d_inode);
-	if (beneath)
-		path_put(&under);
-	res->parent = ERR_PTR(err);
+	} while (err == -EAGAIN);
 }
 
-static inline void lock_mount(struct path *path, struct pinned_mountpoint *m)
+static inline void lock_mount(const struct path *path,
+			      struct pinned_mountpoint *m)
 {
 	do_lock_mount(path, m, false);
 }
@@ -3616,7 +3622,11 @@ static int do_move_mount(struct path *old_path,
 	}
 
 	if (beneath) {
-		err = can_move_mount_beneath(old, real_mount(new_path->mnt), mp.mp);
+		struct mount *over = real_mount(new_path->mnt);
+
+		if (mp.parent != over->mnt_parent)
+			over = mp.parent->overmount;
+		err = can_move_mount_beneath(old, over, mp.mp);
 		if (err)
 			return err;
 	}
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 36/63] constify check_mnt()
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (33 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 35/63] do_lock_mount(): don't modify path Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 37/63] do_mount_setattr(): constify path argument Al Viro
                       ` (26 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index b77d2df606a1..de894f96d9c2 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1010,7 +1010,7 @@ static void unpin_mountpoint(struct pinned_mountpoint *m)
 	}
 }
 
-static inline int check_mnt(struct mount *mnt)
+static inline int check_mnt(const struct mount *mnt)
 {
 	return mnt->mnt_ns == current->nsproxy->mnt_ns;
 }
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 37/63] do_mount_setattr(): constify path argument
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (34 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 36/63] constify check_mnt() Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 38/63] do_set_group(): constify path arguments Al Viro
                       ` (25 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index de894f96d9c2..5766d6a3a279 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4865,7 +4865,7 @@ static void mount_setattr_commit(struct mount_kattr *kattr, struct mount *mnt)
 	touch_mnt_namespace(mnt->mnt_ns);
 }
 
-static int do_mount_setattr(struct path *path, struct mount_kattr *kattr)
+static int do_mount_setattr(const struct path *path, struct mount_kattr *kattr)
 {
 	struct mount *mnt = real_mount(path->mnt);
 	int err = 0;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 38/63] do_set_group(): constify path arguments
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (35 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 37/63] do_mount_setattr(): constify path argument Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 39/63] drop_collected_paths(): constify arguments Al Viro
                       ` (24 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5766d6a3a279..e4ca76091bd7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3359,7 +3359,7 @@ static inline int tree_contains_unbindable(struct mount *mnt)
 	return 0;
 }
 
-static int do_set_group(struct path *from_path, struct path *to_path)
+static int do_set_group(const struct path *from_path, const struct path *to_path)
 {
 	struct mount *from = real_mount(from_path->mnt);
 	struct mount *to = real_mount(to_path->mnt);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 39/63] drop_collected_paths(): constify arguments
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (36 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 38/63] do_set_group(): constify path arguments Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 40/63] collect_paths(): constify the return value Al Viro
                       ` (23 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... and use that to constify the pointers in callers

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c        |  4 ++--
 include/linux/mount.h |  2 +-
 kernel/audit_tree.c   | 12 ++++++------
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index e4ca76091bd7..61dfa899bd57 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2334,9 +2334,9 @@ struct path *collect_paths(const struct path *path,
 	return res;
 }
 
-void drop_collected_paths(struct path *paths, struct path *prealloc)
+void drop_collected_paths(const struct path *paths, struct path *prealloc)
 {
-	for (struct path *p = paths; p->mnt; p++)
+	for (const struct path *p = paths; p->mnt; p++)
 		path_put(p);
 	if (paths != prealloc)
 		kfree(paths);
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 5f9c053b0897..c09032463b36 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -105,7 +105,7 @@ extern int may_umount(struct vfsmount *);
 int do_mount(const char *, const char __user *,
 		     const char *, unsigned long, void *);
 extern struct path *collect_paths(const struct path *, struct path *, unsigned);
-extern void drop_collected_paths(struct path *, struct path *);
+extern void drop_collected_paths(const struct path *, struct path *);
 extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num);
 
 extern int cifs_root_data(char **dev, char **opts);
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index b0eae2a3c895..32007edf0e55 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -678,7 +678,7 @@ void audit_trim_trees(void)
 		struct audit_tree *tree;
 		struct path path;
 		struct audit_node *node;
-		struct path *paths;
+		const struct path *paths;
 		struct path array[16];
 		int err;
 
@@ -701,7 +701,7 @@ void audit_trim_trees(void)
 			struct audit_chunk *chunk = find_chunk(node);
 			/* this could be NULL if the watch is dying else where... */
 			node->index |= 1U<<31;
-			for (struct path *p = paths; p->dentry; p++) {
+			for (const struct path *p = paths; p->dentry; p++) {
 				struct inode *inode = p->dentry->d_inode;
 				if (inode_to_key(inode) == chunk->key) {
 					node->index &= ~(1U<<31);
@@ -740,9 +740,9 @@ void audit_put_tree(struct audit_tree *tree)
 	put_tree(tree);
 }
 
-static int tag_mounts(struct path *paths, struct audit_tree *tree)
+static int tag_mounts(const struct path *paths, struct audit_tree *tree)
 {
-	for (struct path *p = paths; p->dentry; p++) {
+	for (const struct path *p = paths; p->dentry; p++) {
 		int err = tag_chunk(p->dentry->d_inode, tree);
 		if (err)
 			return err;
@@ -805,7 +805,7 @@ int audit_add_tree_rule(struct audit_krule *rule)
 	struct audit_tree *seed = rule->tree, *tree;
 	struct path path;
 	struct path array[16];
-	struct path *paths;
+	const struct path *paths;
 	int err;
 
 	rule->tree = NULL;
@@ -877,7 +877,7 @@ int audit_tag_tree(char *old, char *new)
 	int failed = 0;
 	struct path path1, path2;
 	struct path array[16];
-	struct path *paths;
+	const struct path *paths;
 	int err;
 
 	err = kern_path(new, 0, &path2);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 40/63] collect_paths(): constify the return value
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (37 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 39/63] drop_collected_paths(): constify arguments Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 41/63] do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s) Al Viro
                       ` (22 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

callers have no business modifying the paths they get

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c        | 4 ++--
 include/linux/mount.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 61dfa899bd57..43f46d9e84fe 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2300,7 +2300,7 @@ static inline bool extend_array(struct path **res, struct path **to_free,
 	return p;
 }
 
-struct path *collect_paths(const struct path *path,
+const struct path *collect_paths(const struct path *path,
 			      struct path *prealloc, unsigned count)
 {
 	struct mount *root = real_mount(path->mnt);
@@ -2334,7 +2334,7 @@ struct path *collect_paths(const struct path *path,
 	return res;
 }
 
-void drop_collected_paths(const struct path *paths, struct path *prealloc)
+void drop_collected_paths(const struct path *paths, const struct path *prealloc)
 {
 	for (const struct path *p = paths; p->mnt; p++)
 		path_put(p);
diff --git a/include/linux/mount.h b/include/linux/mount.h
index c09032463b36..18e4b97f8a98 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -104,8 +104,8 @@ extern int may_umount_tree(struct vfsmount *);
 extern int may_umount(struct vfsmount *);
 int do_mount(const char *, const char __user *,
 		     const char *, unsigned long, void *);
-extern struct path *collect_paths(const struct path *, struct path *, unsigned);
-extern void drop_collected_paths(const struct path *, struct path *);
+extern const struct path *collect_paths(const struct path *, struct path *, unsigned);
+extern void drop_collected_paths(const struct path *, const struct path *);
 extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num);
 
 extern int cifs_root_data(char **dev, char **opts);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 41/63] do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s)
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (38 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 40/63] collect_paths(): constify the return value Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 42/63] mnt_warn_timestamp_expiry(): constify struct path argument Al Viro
                       ` (21 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 43f46d9e84fe..70ae769ecf11 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3572,8 +3572,9 @@ static inline bool may_use_mount(struct mount *mnt)
 	return check_anonymous_mnt(mnt);
 }
 
-static int do_move_mount(struct path *old_path,
-			 struct path *new_path, enum mnt_tree_flags_t flags)
+static int do_move_mount(const struct path *old_path,
+			 const struct path *new_path,
+			 enum mnt_tree_flags_t flags)
 {
 	struct mount *old = real_mount(old_path->mnt);
 	int err;
@@ -3645,7 +3646,7 @@ static int do_move_mount(struct path *old_path,
 	return attach_recursive_mnt(old, &mp);
 }
 
-static int do_move_mount_old(struct path *path, const char *old_name)
+static int do_move_mount_old(const struct path *path, const char *old_name)
 {
 	struct path old_path;
 	int err;
@@ -4475,7 +4476,8 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 	return ret;
 }
 
-static inline int vfs_move_mount(struct path *from_path, struct path *to_path,
+static inline int vfs_move_mount(const struct path *from_path,
+				 const struct path *to_path,
 				 enum mnt_tree_flags_t mflags)
 {
 	int ret;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 42/63] mnt_warn_timestamp_expiry(): constify struct path argument
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (39 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 41/63] do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s) Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 43/63] do_new_mount{,_fc}(): " Al Viro
                       ` (20 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 70ae769ecf11..a7c840371a7f 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3230,7 +3230,8 @@ static void set_mount_attributes(struct mount *mnt, unsigned int mnt_flags)
 	touch_mnt_namespace(mnt->mnt_ns);
 }
 
-static void mnt_warn_timestamp_expiry(struct path *mountpoint, struct vfsmount *mnt)
+static void mnt_warn_timestamp_expiry(const struct path *mountpoint,
+				      struct vfsmount *mnt)
 {
 	struct super_block *sb = mnt->mnt_sb;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 43/63] do_new_mount{,_fc}(): constify struct path argument
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (40 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 42/63] mnt_warn_timestamp_expiry(): constify struct path argument Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 44/63] do_{loopback,change_type,remount,reconfigure_mnt}(): " Al Viro
                       ` (19 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a7c840371a7f..8ff54e0da446 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3704,7 +3704,7 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
  * Create a new mount using a superblock configuration and request it
  * be added to the namespace tree.
  */
-static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
+static int do_new_mount_fc(struct fs_context *fc, const struct path *mountpoint,
 			   unsigned int mnt_flags)
 {
 	struct super_block *sb;
@@ -3735,8 +3735,9 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
  * create a new mount for userspace and request it to be added into the
  * namespace's tree
  */
-static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
-			int mnt_flags, const char *name, void *data)
+static int do_new_mount(const struct path *path, const char *fstype,
+			int sb_flags, int mnt_flags,
+			const char *name, void *data)
 {
 	struct file_system_type *type;
 	struct fs_context *fc;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 44/63] do_{loopback,change_type,remount,reconfigure_mnt}(): constify struct path argument
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (41 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 43/63] do_new_mount{,_fc}(): " Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 45/63] path_mount(): " Al Viro
                       ` (18 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 8ff54e0da446..6ae42f3a9f10 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2914,7 +2914,7 @@ static int flags_to_propagation_type(int ms_flags)
 /*
  * recursively change the type of the mountpoint.
  */
-static int do_change_type(struct path *path, int ms_flags)
+static int do_change_type(const struct path *path, int ms_flags)
 {
 	struct mount *m;
 	struct mount *mnt = real_mount(path->mnt);
@@ -3034,8 +3034,8 @@ static struct mount *__do_loopback(struct path *old_path, int recurse)
 /*
  * do loopback mount.
  */
-static int do_loopback(struct path *path, const char *old_name,
-				int recurse)
+static int do_loopback(const struct path *path, const char *old_name,
+		       int recurse)
 {
 	struct path old_path __free(path_put) = {};
 	struct mount *mnt = NULL;
@@ -3265,7 +3265,7 @@ static void mnt_warn_timestamp_expiry(const struct path *mountpoint,
  * superblock it refers to.  This is triggered by specifying MS_REMOUNT|MS_BIND
  * to mount(2).
  */
-static int do_reconfigure_mnt(struct path *path, unsigned int mnt_flags)
+static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags)
 {
 	struct super_block *sb = path->mnt->mnt_sb;
 	struct mount *mnt = real_mount(path->mnt);
@@ -3302,7 +3302,7 @@ static int do_reconfigure_mnt(struct path *path, unsigned int mnt_flags)
  * If you've mounted a non-root directory somewhere and want to do remount
  * on it - tough luck.
  */
-static int do_remount(struct path *path, int ms_flags, int sb_flags,
+static int do_remount(const struct path *path, int ms_flags, int sb_flags,
 		      int mnt_flags, void *data)
 {
 	int err;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 45/63] path_mount(): constify struct path argument
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (42 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 44/63] do_{loopback,change_type,remount,reconfigure_mnt}(): " Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 46/63] may_copy_tree(), __do_loopback(): " Al Viro
                       ` (17 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

now it finally can be done.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/internal.h  | 2 +-
 fs/namespace.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 38e8aab27bbd..fe88563b4822 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -84,7 +84,7 @@ void mnt_put_write_access_file(struct file *file);
 extern void dissolve_on_fput(struct vfsmount *);
 extern bool may_mount(void);
 
-int path_mount(const char *dev_name, struct path *path,
+int path_mount(const char *dev_name, const struct path *path,
 		const char *type_page, unsigned long flags, void *data_page);
 int path_umount(struct path *path, int flags);
 
diff --git a/fs/namespace.c b/fs/namespace.c
index 6ae42f3a9f10..34a71d5cdf88 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4018,7 +4018,7 @@ static char *copy_mount_string(const void __user *data)
  * Therefore, if this magic number is present, it carries no information
  * and must be discarded.
  */
-int path_mount(const char *dev_name, struct path *path,
+int path_mount(const char *dev_name, const struct path *path,
 		const char *type_page, unsigned long flags, void *data_page)
 {
 	unsigned int mnt_flags = 0, sb_flags;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 46/63] may_copy_tree(), __do_loopback(): constify struct path argument
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (43 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 45/63] path_mount(): " Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 47/63] path_umount(): " Al Viro
                       ` (16 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 34a71d5cdf88..b15632b70223 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2990,7 +2990,7 @@ static int do_change_type(const struct path *path, int ms_flags)
  *
  * Returns true if the mount tree can be copied, false otherwise.
  */
-static inline bool may_copy_tree(struct path *path)
+static inline bool may_copy_tree(const struct path *path)
 {
 	struct mount *mnt = real_mount(path->mnt);
 	const struct dentry_operations *d_op;
@@ -3012,7 +3012,7 @@ static inline bool may_copy_tree(struct path *path)
 }
 
 
-static struct mount *__do_loopback(struct path *old_path, int recurse)
+static struct mount *__do_loopback(const struct path *old_path, int recurse)
 {
 	struct mount *old = real_mount(old_path->mnt);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 47/63] path_umount(): constify struct path argument
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (44 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 46/63] may_copy_tree(), __do_loopback(): " Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 48/63] constify can_move_mount_beneath() arguments Al Viro
                       ` (15 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/internal.h  | 2 +-
 fs/namespace.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index fe88563b4822..549e6bd453b0 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -86,7 +86,7 @@ extern bool may_mount(void);
 
 int path_mount(const char *dev_name, const struct path *path,
 		const char *type_page, unsigned long flags, void *data_page);
-int path_umount(struct path *path, int flags);
+int path_umount(const struct path *path, int flags);
 
 int show_path(struct seq_file *m, struct dentry *root);
 
diff --git a/fs/namespace.c b/fs/namespace.c
index b15632b70223..a14cb2cabc1a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2084,7 +2084,7 @@ static int can_umount(const struct path *path, int flags)
 }
 
 // caller is responsible for flags being sane
-int path_umount(struct path *path, int flags)
+int path_umount(const struct path *path, int flags)
 {
 	struct mount *mnt = real_mount(path->mnt);
 	int ret;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 48/63] constify can_move_mount_beneath() arguments
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (45 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 47/63] path_umount(): " Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 49/63] do_move_mount_old(): use __free(path_put) Al Viro
                       ` (14 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a14cb2cabc1a..daca5e3bec38 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3472,8 +3472,8 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  * Context: This function expects namespace_lock() to be held.
  * Return: On success 0, and on error a negative error code is returned.
  */
-static int can_move_mount_beneath(struct mount *mnt_from,
-				  struct mount *mnt_to,
+static int can_move_mount_beneath(const struct mount *mnt_from,
+				  const struct mount *mnt_to,
 				  const struct mountpoint *mp)
 {
 	struct mount *parent_mnt_to = mnt_to->mnt_parent;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 49/63] do_move_mount_old(): use __free(path_put)
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (46 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 48/63] constify can_move_mount_beneath() arguments Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 50/63] do_mount(): " Al Viro
                       ` (13 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index daca5e3bec38..a57598ec422a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3649,7 +3649,7 @@ static int do_move_mount(const struct path *old_path,
 
 static int do_move_mount_old(const struct path *path, const char *old_name)
 {
-	struct path old_path;
+	struct path old_path __free(path_put) = {};
 	int err;
 
 	if (!old_name || !*old_name)
@@ -3659,9 +3659,7 @@ static int do_move_mount_old(const struct path *path, const char *old_name)
 	if (err)
 		return err;
 
-	err = do_move_mount(&old_path, path, 0);
-	path_put(&old_path);
-	return err;
+	return do_move_mount(&old_path, path, 0);
 }
 
 /*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 50/63] do_mount(): use __free(path_put)
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (47 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 49/63] do_move_mount_old(): use __free(path_put) Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 51/63] umount_tree(): take all victims out of propagation graph at once Al Viro
                       ` (12 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a57598ec422a..b290e2b3bcfb 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4098,15 +4098,13 @@ int path_mount(const char *dev_name, const struct path *path,
 int do_mount(const char *dev_name, const char __user *dir_name,
 		const char *type_page, unsigned long flags, void *data_page)
 {
-	struct path path;
+	struct path path __free(path_put) = {};
 	int ret;
 
 	ret = user_path_at(AT_FDCWD, dir_name, LOOKUP_FOLLOW, &path);
 	if (ret)
 		return ret;
-	ret = path_mount(dev_name, &path, type_page, flags, data_page);
-	path_put(&path);
-	return ret;
+	return path_mount(dev_name, &path, type_page, flags, data_page);
 }
 
 static struct ucounts *inc_mnt_namespaces(struct user_namespace *ns)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 51/63] umount_tree(): take all victims out of propagation graph at once
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (48 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 50/63] do_mount(): " Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-09-01 11:50       ` Christian Brauner
  2025-08-28 23:07     ` [PATCH v2 52/63] ecryptfs: get rid of pointless mount references in ecryptfs dentries Al Viro
                       ` (11 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

For each removed mount we need to calculate where the slaves will end up.
To avoid duplicating that work, do it for all mounts to be removed
at once, taking the mounts themselves out of propagation graph as
we go, then do all transfers; the duplicate work on finding destinations
is avoided since if we run into a mount that already had destination found,
we don't need to trace the rest of the way.  That's guaranteed
O(removed mounts) for finding destinations and removing from propagation
graph and O(surviving mounts that have master removed) for transfers.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c |  3 ++-
 fs/pnode.c     | 67 +++++++++++++++++++++++++++++++++++++++-----------
 fs/pnode.h     |  1 +
 3 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index b290e2b3bcfb..de9a88f45dc1 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1846,6 +1846,8 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
 	if (how & UMOUNT_PROPAGATE)
 		propagate_umount(&tmp_list);
 
+	bulk_make_private(&tmp_list);
+
 	while (!list_empty(&tmp_list)) {
 		struct mnt_namespace *ns;
 		bool disconnect;
@@ -1870,7 +1872,6 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
 				umount_mnt(p);
 			}
 		}
-		change_mnt_propagation(p, MS_PRIVATE);
 		if (disconnect)
 			hlist_add_head(&p->mnt_umount, &unmounted);
 
diff --git a/fs/pnode.c b/fs/pnode.c
index edaf9d9d0eaf..5d91c3e58d2a 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -71,19 +71,6 @@ static inline bool will_be_unmounted(struct mount *m)
 	return m->mnt.mnt_flags & MNT_UMOUNT;
 }
 
-static struct mount *propagation_source(struct mount *mnt)
-{
-	do {
-		struct mount *m;
-		for (m = next_peer(mnt); m != mnt; m = next_peer(m)) {
-			if (!will_be_unmounted(m))
-				return m;
-		}
-		mnt = mnt->mnt_master;
-	} while (mnt && will_be_unmounted(mnt));
-	return mnt;
-}
-
 static void transfer_propagation(struct mount *mnt, struct mount *to)
 {
 	struct hlist_node *p = NULL, *n;
@@ -112,11 +99,10 @@ void change_mnt_propagation(struct mount *mnt, int type)
 		return;
 	}
 	if (IS_MNT_SHARED(mnt)) {
-		if (type == MS_SLAVE || !hlist_empty(&mnt->mnt_slave_list))
-			m = propagation_source(mnt);
 		if (list_empty(&mnt->mnt_share)) {
 			mnt_release_group_id(mnt);
 		} else {
+			m = next_peer(mnt);
 			list_del_init(&mnt->mnt_share);
 			mnt->mnt_group_id = 0;
 		}
@@ -137,6 +123,57 @@ void change_mnt_propagation(struct mount *mnt, int type)
 	}
 }
 
+static struct mount *trace_transfers(struct mount *m)
+{
+	while (1) {
+		struct mount *next = next_peer(m);
+
+		if (next != m) {
+			list_del_init(&m->mnt_share);
+			m->mnt_group_id = 0;
+			m->mnt_master = next;
+		} else {
+			if (IS_MNT_SHARED(m))
+				mnt_release_group_id(m);
+			next = m->mnt_master;
+		}
+		hlist_del_init(&m->mnt_slave);
+		CLEAR_MNT_SHARED(m);
+		SET_MNT_MARK(m);
+
+		if (!next || !will_be_unmounted(next))
+			return next;
+		if (IS_MNT_MARKED(next))
+			return next->mnt_master;
+		m = next;
+	}
+}
+
+static void set_destinations(struct mount *m, struct mount *master)
+{
+	struct mount *next;
+
+	while ((next = m->mnt_master) != master) {
+		m->mnt_master = master;
+		m = next;
+	}
+}
+
+void bulk_make_private(struct list_head *set)
+{
+	struct mount *m;
+
+	list_for_each_entry(m, set, mnt_list)
+		if (!IS_MNT_MARKED(m))
+			set_destinations(m, trace_transfers(m));
+
+	list_for_each_entry(m, set, mnt_list) {
+		transfer_propagation(m, m->mnt_master);
+		m->mnt_master = NULL;
+		CLEAR_MNT_MARK(m);
+	}
+}
+
 static struct mount *__propagation_next(struct mount *m,
 					 struct mount *origin)
 {
diff --git a/fs/pnode.h b/fs/pnode.h
index 00ab153e3e9d..b029db225f33 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -42,6 +42,7 @@ static inline bool peers(const struct mount *m1, const struct mount *m2)
 }
 
 void change_mnt_propagation(struct mount *, int);
+void bulk_make_private(struct list_head *);
 int propagate_mnt(struct mount *, struct mountpoint *, struct mount *,
 		struct hlist_head *);
 void propagate_umount(struct list_head *);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 52/63] ecryptfs: get rid of pointless mount references in ecryptfs dentries
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (49 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 51/63] umount_tree(): take all victims out of propagation graph at once Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 53/63] fs/namespace.c: sanitize descriptions for {__,}lookup_mnt() Al Viro
                       ` (10 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

->lower_path.mnt has the same value for all dentries on given ecryptfs
instance and if somebody goes for mountpoint-crossing variant where that
would not be true, we can deal with that when it happens (and _not_
with duplicating these reference into each dentry).

As it is, we are better off just sticking a reference into ecryptfs-private
part of superblock and keeping it pinned until ->kill_sb().

That way we can stick a reference to underlying dentry right into ->d_fsdata
of ecryptfs one, getting rid of indirection through struct ecryptfs_dentry_info,
along with the entire struct ecryptfs_dentry_info machinery.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ecryptfs/dentry.c          | 14 +-------------
 fs/ecryptfs/ecryptfs_kernel.h | 27 +++++++++++----------------
 fs/ecryptfs/file.c            | 15 +++++++--------
 fs/ecryptfs/inode.c           | 19 +++++--------------
 fs/ecryptfs/main.c            | 24 ++++++------------------
 5 files changed, 30 insertions(+), 69 deletions(-)

diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c
index 1dfd5b81d831..6648a924e31a 100644
--- a/fs/ecryptfs/dentry.c
+++ b/fs/ecryptfs/dentry.c
@@ -59,14 +59,6 @@ static int ecryptfs_d_revalidate(struct inode *dir, const struct qstr *name,
 	return rc;
 }
 
-struct kmem_cache *ecryptfs_dentry_info_cache;
-
-static void ecryptfs_dentry_free_rcu(struct rcu_head *head)
-{
-	kmem_cache_free(ecryptfs_dentry_info_cache,
-		container_of(head, struct ecryptfs_dentry_info, rcu));
-}
-
 /**
  * ecryptfs_d_release
  * @dentry: The ecryptfs dentry
@@ -75,11 +67,7 @@ static void ecryptfs_dentry_free_rcu(struct rcu_head *head)
  */
 static void ecryptfs_d_release(struct dentry *dentry)
 {
-	struct ecryptfs_dentry_info *p = dentry->d_fsdata;
-	if (p) {
-		path_put(&p->lower_path);
-		call_rcu(&p->rcu, ecryptfs_dentry_free_rcu);
-	}
+	dput(dentry->d_fsdata);
 }
 
 const struct dentry_operations ecryptfs_dops = {
diff --git a/fs/ecryptfs/ecryptfs_kernel.h b/fs/ecryptfs/ecryptfs_kernel.h
index 1f562e75d0e4..9e6ab0b41337 100644
--- a/fs/ecryptfs/ecryptfs_kernel.h
+++ b/fs/ecryptfs/ecryptfs_kernel.h
@@ -258,13 +258,6 @@ struct ecryptfs_inode_info {
 	struct ecryptfs_crypt_stat crypt_stat;
 };
 
-/* dentry private data. Each dentry must keep track of a lower
- * vfsmount too. */
-struct ecryptfs_dentry_info {
-	struct path lower_path;
-	struct rcu_head rcu;
-};
-
 /**
  * ecryptfs_global_auth_tok - A key used to encrypt all new files under the mountpoint
  * @flags: Status flags
@@ -348,6 +341,7 @@ struct ecryptfs_mount_crypt_stat {
 /* superblock private data. */
 struct ecryptfs_sb_info {
 	struct super_block *wsi_sb;
+	struct vfsmount *lower_mnt;
 	struct ecryptfs_mount_crypt_stat mount_crypt_stat;
 };
 
@@ -494,22 +488,25 @@ ecryptfs_set_superblock_lower(struct super_block *sb,
 }
 
 static inline void
-ecryptfs_set_dentry_private(struct dentry *dentry,
-			    struct ecryptfs_dentry_info *dentry_info)
+ecryptfs_set_dentry_lower(struct dentry *dentry,
+			  struct dentry *lower_dentry)
 {
-	dentry->d_fsdata = dentry_info;
+	dentry->d_fsdata = lower_dentry;
 }
 
 static inline struct dentry *
 ecryptfs_dentry_to_lower(struct dentry *dentry)
 {
-	return ((struct ecryptfs_dentry_info *)dentry->d_fsdata)->lower_path.dentry;
+	return dentry->d_fsdata;
 }
 
-static inline const struct path *
-ecryptfs_dentry_to_lower_path(struct dentry *dentry)
+static inline struct path
+ecryptfs_lower_path(struct dentry *dentry)
 {
-	return &((struct ecryptfs_dentry_info *)dentry->d_fsdata)->lower_path;
+	return (struct path){
+		.mnt = ecryptfs_superblock_to_private(dentry->d_sb)->lower_mnt,
+		.dentry = ecryptfs_dentry_to_lower(dentry)
+	};
 }
 
 #define ecryptfs_printk(type, fmt, arg...) \
@@ -532,7 +529,6 @@ extern unsigned int ecryptfs_number_of_users;
 
 extern struct kmem_cache *ecryptfs_auth_tok_list_item_cache;
 extern struct kmem_cache *ecryptfs_file_info_cache;
-extern struct kmem_cache *ecryptfs_dentry_info_cache;
 extern struct kmem_cache *ecryptfs_inode_info_cache;
 extern struct kmem_cache *ecryptfs_sb_info_cache;
 extern struct kmem_cache *ecryptfs_header_cache;
@@ -557,7 +553,6 @@ int ecryptfs_encrypt_and_encode_filename(
 	size_t *encoded_name_size,
 	struct ecryptfs_mount_crypt_stat *mount_crypt_stat,
 	const char *name, size_t name_size);
-struct dentry *ecryptfs_lower_dentry(struct dentry *this_dentry);
 void ecryptfs_dump_hex(char *data, int bytes);
 int virt_to_scatterlist(const void *addr, int size, struct scatterlist *sg,
 			int sg_size);
diff --git a/fs/ecryptfs/file.c b/fs/ecryptfs/file.c
index 5f8f96da09fe..7929411837cf 100644
--- a/fs/ecryptfs/file.c
+++ b/fs/ecryptfs/file.c
@@ -33,13 +33,12 @@ static ssize_t ecryptfs_read_update_atime(struct kiocb *iocb,
 				struct iov_iter *to)
 {
 	ssize_t rc;
-	const struct path *path;
 	struct file *file = iocb->ki_filp;
 
 	rc = generic_file_read_iter(iocb, to);
 	if (rc >= 0) {
-		path = ecryptfs_dentry_to_lower_path(file->f_path.dentry);
-		touch_atime(path);
+		struct path path = ecryptfs_lower_path(file->f_path.dentry);
+		touch_atime(&path);
 	}
 	return rc;
 }
@@ -59,12 +58,11 @@ static ssize_t ecryptfs_splice_read_update_atime(struct file *in, loff_t *ppos,
 						 size_t len, unsigned int flags)
 {
 	ssize_t rc;
-	const struct path *path;
 
 	rc = filemap_splice_read(in, ppos, pipe, len, flags);
 	if (rc >= 0) {
-		path = ecryptfs_dentry_to_lower_path(in->f_path.dentry);
-		touch_atime(path);
+		struct path path = ecryptfs_lower_path(in->f_path.dentry);
+		touch_atime(&path);
 	}
 	return rc;
 }
@@ -283,6 +281,7 @@ static int ecryptfs_dir_open(struct inode *inode, struct file *file)
 	 * ecryptfs_lookup() */
 	struct ecryptfs_file_info *file_info;
 	struct file *lower_file;
+	struct path path;
 
 	/* Released in ecryptfs_release or end of function if failure */
 	file_info = kmem_cache_zalloc(ecryptfs_file_info_cache, GFP_KERNEL);
@@ -292,8 +291,8 @@ static int ecryptfs_dir_open(struct inode *inode, struct file *file)
 				"Error attempting to allocate memory\n");
 		return -ENOMEM;
 	}
-	lower_file = dentry_open(ecryptfs_dentry_to_lower_path(ecryptfs_dentry),
-				 file->f_flags, current_cred());
+	path = ecryptfs_lower_path(ecryptfs_dentry);
+	lower_file = dentry_open(&path, file->f_flags, current_cred());
 	if (IS_ERR(lower_file)) {
 		printk(KERN_ERR "%s: Error attempting to initialize "
 			"the lower file for the dentry with name "
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 72fbe1316ab8..d2b262dc485d 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -327,24 +327,15 @@ static int ecryptfs_i_size_read(struct dentry *dentry, struct inode *inode)
 static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry,
 				     struct dentry *lower_dentry)
 {
-	const struct path *path = ecryptfs_dentry_to_lower_path(dentry->d_parent);
+	struct dentry *lower_parent = ecryptfs_dentry_to_lower(dentry->d_parent);
 	struct inode *inode, *lower_inode;
-	struct ecryptfs_dentry_info *dentry_info;
 	int rc = 0;
 
-	dentry_info = kmem_cache_alloc(ecryptfs_dentry_info_cache, GFP_KERNEL);
-	if (!dentry_info) {
-		dput(lower_dentry);
-		return ERR_PTR(-ENOMEM);
-	}
-
 	fsstack_copy_attr_atime(d_inode(dentry->d_parent),
-				d_inode(path->dentry));
+				d_inode(lower_parent));
 	BUG_ON(!d_count(lower_dentry));
 
-	ecryptfs_set_dentry_private(dentry, dentry_info);
-	dentry_info->lower_path.mnt = mntget(path->mnt);
-	dentry_info->lower_path.dentry = lower_dentry;
+	ecryptfs_set_dentry_lower(dentry, lower_dentry);
 
 	/*
 	 * negative dentry can go positive under us here - its parent is not
@@ -1022,10 +1013,10 @@ static int ecryptfs_getattr(struct mnt_idmap *idmap,
 {
 	struct dentry *dentry = path->dentry;
 	struct kstat lower_stat;
+	struct path lower_path = ecryptfs_lower_path(dentry);
 	int rc;
 
-	rc = vfs_getattr_nosec(ecryptfs_dentry_to_lower_path(dentry),
-			       &lower_stat, request_mask, flags);
+	rc = vfs_getattr_nosec(&lower_path, &lower_stat, request_mask, flags);
 	if (!rc) {
 		fsstack_copy_attr_all(d_inode(dentry),
 				      ecryptfs_inode_to_lower(d_inode(dentry)));
diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c
index eab1beb846d3..2afbcbbd9546 100644
--- a/fs/ecryptfs/main.c
+++ b/fs/ecryptfs/main.c
@@ -106,15 +106,14 @@ static int ecryptfs_init_lower_file(struct dentry *dentry,
 				    struct file **lower_file)
 {
 	const struct cred *cred = current_cred();
-	const struct path *path = ecryptfs_dentry_to_lower_path(dentry);
+	struct path path = ecryptfs_lower_path(dentry);
 	int rc;
 
-	rc = ecryptfs_privileged_open(lower_file, path->dentry, path->mnt,
-				      cred);
+	rc = ecryptfs_privileged_open(lower_file, path.dentry, path.mnt, cred);
 	if (rc) {
 		printk(KERN_ERR "Error opening lower file "
 		       "for lower_dentry [0x%p] and lower_mnt [0x%p]; "
-		       "rc = [%d]\n", path->dentry, path->mnt, rc);
+		       "rc = [%d]\n", path.dentry, path.mnt, rc);
 		(*lower_file) = NULL;
 	}
 	return rc;
@@ -437,7 +436,6 @@ static int ecryptfs_get_tree(struct fs_context *fc)
 	struct ecryptfs_fs_context *ctx = fc->fs_private;
 	struct ecryptfs_sb_info *sbi = fc->s_fs_info;
 	struct ecryptfs_mount_crypt_stat *mount_crypt_stat;
-	struct ecryptfs_dentry_info *root_info;
 	const char *err = "Getting sb failed";
 	struct inode *inode;
 	struct path path;
@@ -543,14 +541,8 @@ static int ecryptfs_get_tree(struct fs_context *fc)
 		goto out_free;
 	}
 
-	rc = -ENOMEM;
-	root_info = kmem_cache_zalloc(ecryptfs_dentry_info_cache, GFP_KERNEL);
-	if (!root_info)
-		goto out_free;
-
-	/* ->kill_sb() will take care of root_info */
-	ecryptfs_set_dentry_private(s->s_root, root_info);
-	root_info->lower_path = path;
+	ecryptfs_set_dentry_lower(s->s_root, path.dentry);
+	sbi->lower_mnt = path.mnt;
 
 	s->s_flags |= SB_ACTIVE;
 	fc->root = dget(s->s_root);
@@ -580,6 +572,7 @@ static void ecryptfs_kill_block_super(struct super_block *sb)
 	kill_anon_super(sb);
 	if (!sb_info)
 		return;
+	mntput(sb_info->lower_mnt);
 	ecryptfs_destroy_mount_crypt_stat(&sb_info->mount_crypt_stat);
 	kmem_cache_free(ecryptfs_sb_info_cache, sb_info);
 }
@@ -667,11 +660,6 @@ static struct ecryptfs_cache_info {
 		.name = "ecryptfs_file_cache",
 		.size = sizeof(struct ecryptfs_file_info),
 	},
-	{
-		.cache = &ecryptfs_dentry_info_cache,
-		.name = "ecryptfs_dentry_info_cache",
-		.size = sizeof(struct ecryptfs_dentry_info),
-	},
 	{
 		.cache = &ecryptfs_inode_info_cache,
 		.name = "ecryptfs_inode_cache",
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 53/63] fs/namespace.c: sanitize descriptions for {__,}lookup_mnt()
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (50 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 52/63] ecryptfs: get rid of pointless mount references in ecryptfs dentries Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-28 23:07     ` [PATCH v2 54/63] open_detached_copy(): don't bother with mount_lock_hash() Al Viro
                       ` (9 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Comments regarding "shadow mounts" were stale - no such thing anymore.
Document the locking requirements for __lookup_mnt().

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 41 ++++++++++++-----------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index de9a88f45dc1..2e35f5eb4f81 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -825,24 +825,16 @@ static bool legitimize_mnt(struct vfsmount *bastard, unsigned seq)
 }
 
 /**
- * __lookup_mnt - find first child mount
+ * __lookup_mnt - mount hash lookup
  * @mnt:	parent mount
- * @dentry:	mountpoint
+ * @dentry:	dentry of mountpoint
  *
- * If @mnt has a child mount @c mounted @dentry find and return it.
+ * If @mnt has a child mount @c mounted on @dentry find and return it.
+ * Caller must either hold the spinlock component of @mount_lock or
+ * hold rcu_read_lock(), sample the seqcount component before the call
+ * and recheck it afterwards.
  *
- * Note that the child mount @c need not be unique. There are cases
- * where shadow mounts are created. For example, during mount
- * propagation when a source mount @mnt whose root got overmounted by a
- * mount @o after path lookup but before @namespace_sem could be
- * acquired gets copied and propagated. So @mnt gets copied including
- * @o. When @mnt is propagated to a destination mount @d that already
- * has another mount @n mounted at the same mountpoint then the source
- * mount @mnt will be tucked beneath @n, i.e., @n will be mounted on
- * @mnt and @mnt mounted on @d. Now both @n and @o are mounted at @mnt
- * on @dentry.
- *
- * Return: The first child of @mnt mounted @dentry or NULL.
+ * Return: The child of @mnt mounted on @dentry or %NULL.
  */
 struct mount *__lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
 {
@@ -855,21 +847,12 @@ struct mount *__lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
 	return NULL;
 }
 
-/*
- * lookup_mnt - Return the first child mount mounted at path
- *
- * "First" means first mounted chronologically.  If you create the
- * following mounts:
- *
- * mount /dev/sda1 /mnt
- * mount /dev/sda2 /mnt
- * mount /dev/sda3 /mnt
- *
- * Then lookup_mnt() on the base /mnt dentry in the root mount will
- * return successively the root dentry and vfsmount of /dev/sda1, then
- * /dev/sda2, then /dev/sda3, then NULL.
+/**
+ * lookup_mnt - Return the child mount mounted at given location
+ * @path:	location in the namespace
  *
- * lookup_mnt takes a reference to the found vfsmount.
+ * Acquires and returns a new reference to mount at given location
+ * or %NULL if nothing is mounted there.
  */
 struct vfsmount *lookup_mnt(const struct path *path)
 {
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 54/63] open_detached_copy(): don't bother with mount_lock_hash()
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (51 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 53/63] fs/namespace.c: sanitize descriptions for {__,}lookup_mnt() Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-09-01 11:29       ` Christian Brauner
  2025-08-28 23:07     ` [PATCH v2 55/63] open_detached_copy(): separate creation of namespace into helper Al Viro
                       ` (8 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

we are holding namespace_sem and a reference to root of tree;
iterating through that tree does not need mount_lock.  Neither
does the insertion into the rbtree of new namespace or incrementing
the mount count of that namespace.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2e35f5eb4f81..425c33377770 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3086,14 +3086,12 @@ static struct file *open_detached_copy(struct path *path, bool recursive)
 		return ERR_CAST(mnt);
 	}
 
-	lock_mount_hash();
 	for (p = mnt; p; p = next_mnt(p, mnt)) {
 		mnt_add_to_ns(ns, p);
 		ns->nr_mounts++;
 	}
 	ns->root = mnt;
 	mntget(&mnt->mnt);
-	unlock_mount_hash();
 	namespace_unlock();
 
 	mntput(path->mnt);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 55/63] open_detached_copy(): separate creation of namespace into helper
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (52 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 54/63] open_detached_copy(): don't bother with mount_lock_hash() Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-29  9:54       ` Christian Brauner
  2025-08-28 23:07     ` [PATCH v2 56/63] mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list Al Viro
                       ` (7 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... and convert the helper to use of a guard(namespace_excl)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 425c33377770..c324800e770c 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3053,18 +3053,17 @@ static int do_loopback(const struct path *path, const char *old_name,
 	return err;
 }
 
-static struct file *open_detached_copy(struct path *path, bool recursive)
+static struct mnt_namespace *get_detached_copy(const struct path *path, bool recursive)
 {
 	struct mnt_namespace *ns, *mnt_ns = current->nsproxy->mnt_ns, *src_mnt_ns;
 	struct user_namespace *user_ns = mnt_ns->user_ns;
 	struct mount *mnt, *p;
-	struct file *file;
 
 	ns = alloc_mnt_ns(user_ns, true);
 	if (IS_ERR(ns))
-		return ERR_CAST(ns);
+		return ns;
 
-	namespace_lock();
+	guard(namespace_excl)();
 
 	/*
 	 * Record the sequence number of the source mount namespace.
@@ -3081,8 +3080,7 @@ static struct file *open_detached_copy(struct path *path, bool recursive)
 
 	mnt = __do_loopback(path, recursive);
 	if (IS_ERR(mnt)) {
-		namespace_unlock();
-		free_mnt_ns(ns);
+		emptied_ns = ns;
 		return ERR_CAST(mnt);
 	}
 
@@ -3091,11 +3089,19 @@ static struct file *open_detached_copy(struct path *path, bool recursive)
 		ns->nr_mounts++;
 	}
 	ns->root = mnt;
-	mntget(&mnt->mnt);
-	namespace_unlock();
+	return ns;
+}
+
+static struct file *open_detached_copy(struct path *path, bool recursive)
+{
+	struct mnt_namespace *ns = get_detached_copy(path, recursive);
+	struct file *file;
+
+	if (IS_ERR(ns))
+		return ERR_CAST(ns);
 
 	mntput(path->mnt);
-	path->mnt = &mnt->mnt;
+	path->mnt = mntget(&ns->root->mnt);
 	file = dentry_open(path, O_PATH, current_cred());
 	if (IS_ERR(file))
 		dissolve_on_fput(path->mnt);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 56/63] mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (53 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 55/63] open_detached_copy(): separate creation of namespace into helper Al Viro
@ 2025-08-28 23:07     ` Al Viro
  2025-08-29  9:57       ` Christian Brauner
  2025-08-28 23:08     ` [PATCH v2 57/63] copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure Al Viro
                       ` (6 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Actual removal is done under the lock, but for checking if need to bother
the lockless list_empty() is safe - either that namespace never had never
been added to mnt_ns_tree, in which case the list will stay empty, or
whoever had allocated it has called mnt_ns_tree_add() and it has already
run to completion.  After that point list_empty() will become false and
will remain false, no matter what we do with the neighbors in mnt_ns_list.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index c324800e770c..daa72292ea58 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -195,7 +195,7 @@ static void mnt_ns_release_rcu(struct rcu_head *rcu)
 static void mnt_ns_tree_remove(struct mnt_namespace *ns)
 {
 	/* remove from global mount namespace list */
-	if (!is_anon_ns(ns)) {
+	if (!list_empty(&ns->mnt_ns_list)) {
 		mnt_ns_tree_write_lock();
 		rb_erase(&ns->mnt_ns_tree_node, &mnt_ns_tree);
 		list_bidir_del_rcu(&ns->mnt_ns_list);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 57/63] copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (54 preceding siblings ...)
  2025-08-28 23:07     ` [PATCH v2 56/63] mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list Al Viro
@ 2025-08-28 23:08     ` Al Viro
  2025-08-29  9:56       ` Christian Brauner
  2025-08-28 23:08     ` [PATCH v2 58/63] copy_mnt_ns(): use guards Al Viro
                       ` (5 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Now that free_mnt_ns() works prior to mnt_ns_tree_add(), there's no need for
an open-coded analogue free_mnt_ns() there - yes, we do avoid one call_rcu()
use per failing call of clone() or unshare(), if they fail due to OOM in that
particular spot, but it's not really worth bothering.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index daa72292ea58..a418555586ef 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4190,10 +4190,8 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 		copy_flags |= CL_SLAVE;
 	new = copy_tree(old, old->mnt.mnt_root, copy_flags);
 	if (IS_ERR(new)) {
+		emptied_ns = new_ns;
 		namespace_unlock();
-		ns_free_inum(&new_ns->ns);
-		dec_mnt_namespaces(new_ns->ucounts);
-		mnt_ns_release(new_ns);
 		return ERR_CAST(new);
 	}
 	if (user_ns != ns->user_ns) {
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 58/63] copy_mnt_ns(): use guards
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (55 preceding siblings ...)
  2025-08-28 23:08     ` [PATCH v2 57/63] copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure Al Viro
@ 2025-08-28 23:08     ` Al Viro
  2025-09-01 11:43       ` Christian Brauner
  2025-08-28 23:08     ` [PATCH v2 59/63] setup_mnt(): primitive for connecting a mount to filesystem Al Viro
                       ` (4 subsequent siblings)
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

* mntput() of rootmnt and pwdmnt done via __free(mntput)
* mnt_ns_tree_add() can be done within namespace_excl scope.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a418555586ef..9e16231d4561 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4164,7 +4164,8 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 		struct user_namespace *user_ns, struct fs_struct *new_fs)
 {
 	struct mnt_namespace *new_ns;
-	struct vfsmount *rootmnt = NULL, *pwdmnt = NULL;
+	struct vfsmount *rootmnt __free(mntput) = NULL;
+	struct vfsmount *pwdmnt __free(mntput) = NULL;
 	struct mount *p, *q;
 	struct mount *old;
 	struct mount *new;
@@ -4183,7 +4184,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 	if (IS_ERR(new_ns))
 		return new_ns;
 
-	namespace_lock();
+	guard(namespace_excl)();
 	/* First pass: copy the tree topology */
 	copy_flags = CL_COPY_UNBINDABLE | CL_EXPIRE;
 	if (user_ns != ns->user_ns)
@@ -4191,13 +4192,11 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 	new = copy_tree(old, old->mnt.mnt_root, copy_flags);
 	if (IS_ERR(new)) {
 		emptied_ns = new_ns;
-		namespace_unlock();
 		return ERR_CAST(new);
 	}
 	if (user_ns != ns->user_ns) {
-		lock_mount_hash();
+		guard(mount_writer)();
 		lock_mnt_tree(new);
-		unlock_mount_hash();
 	}
 	new_ns->root = new;
 
@@ -4229,14 +4228,6 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 		while (p->mnt.mnt_root != q->mnt.mnt_root)
 			p = next_mnt(skip_mnt_tree(p), old);
 	}
-	namespace_unlock();
-
-	if (rootmnt)
-		mntput(rootmnt);
-	if (pwdmnt)
-		mntput(pwdmnt);
-
-	mnt_ns_tree_add(new_ns);
 	return new_ns;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 59/63] setup_mnt(): primitive for connecting a mount to filesystem
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (56 preceding siblings ...)
  2025-08-28 23:08     ` [PATCH v2 58/63] copy_mnt_ns(): use guards Al Viro
@ 2025-08-28 23:08     ` Al Viro
  2025-08-28 23:08     ` [PATCH v2 60/63] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags Al Viro
                       ` (3 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Take the identical logics in vfs_create_mount() and clone_mnt() into
a new helper that takes an empty struct mount and attaches it to
given dentry (sub)tree.

Should be called once in the lifetime of every mount, prior to making
it visible in any data structures.

After that point ->mnt_root and ->mnt_sb never change; ->mnt_root
is a counting reference to dentry and ->mnt_sb - an active reference
to superblock.

Mount remains associated with that dentry tree all the way until
the call of cleanup_mnt(), when the refcount eventually drops
to zero.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 9e16231d4561..5af609ff43bc 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1195,6 +1195,21 @@ static void commit_tree(struct mount *mnt)
 	touch_mnt_namespace(n);
 }
 
+static void setup_mnt(struct mount *m, struct dentry *root)
+{
+	struct super_block *s = root->d_sb;
+
+	atomic_inc(&s->s_active);
+	m->mnt.mnt_sb = s;
+	m->mnt.mnt_root = dget(root);
+	m->mnt_mountpoint = m->mnt.mnt_root;
+	m->mnt_parent = m;
+
+	lock_mount_hash();
+	list_add_tail(&m->mnt_instance, &s->s_mounts);
+	unlock_mount_hash();
+}
+
 /**
  * vfs_create_mount - Create a mount for a configured superblock
  * @fc: The configuration context with the superblock attached
@@ -1218,15 +1233,8 @@ struct vfsmount *vfs_create_mount(struct fs_context *fc)
 	if (fc->sb_flags & SB_KERNMOUNT)
 		mnt->mnt.mnt_flags = MNT_INTERNAL;
 
-	atomic_inc(&fc->root->d_sb->s_active);
-	mnt->mnt.mnt_sb		= fc->root->d_sb;
-	mnt->mnt.mnt_root	= dget(fc->root);
-	mnt->mnt_mountpoint	= mnt->mnt.mnt_root;
-	mnt->mnt_parent		= mnt;
+	setup_mnt(mnt, fc->root);
 
-	lock_mount_hash();
-	list_add_tail(&mnt->mnt_instance, &mnt->mnt.mnt_sb->s_mounts);
-	unlock_mount_hash();
 	return &mnt->mnt;
 }
 EXPORT_SYMBOL(vfs_create_mount);
@@ -1284,7 +1292,6 @@ EXPORT_SYMBOL_GPL(vfs_kern_mount);
 static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 					int flag)
 {
-	struct super_block *sb = old->mnt.mnt_sb;
 	struct mount *mnt;
 	int err;
 
@@ -1309,16 +1316,9 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 	if (mnt->mnt_group_id)
 		set_mnt_shared(mnt);
 
-	atomic_inc(&sb->s_active);
 	mnt->mnt.mnt_idmap = mnt_idmap_get(mnt_idmap(&old->mnt));
 
-	mnt->mnt.mnt_sb = sb;
-	mnt->mnt.mnt_root = dget(root);
-	mnt->mnt_mountpoint = mnt->mnt.mnt_root;
-	mnt->mnt_parent = mnt;
-	lock_mount_hash();
-	list_add_tail(&mnt->mnt_instance, &sb->s_mounts);
-	unlock_mount_hash();
+	setup_mnt(mnt, root);
 
 	if (flag & CL_PRIVATE)	// we are done with it
 		return mnt;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 60/63] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (57 preceding siblings ...)
  2025-08-28 23:08     ` [PATCH v2 59/63] setup_mnt(): primitive for connecting a mount to filesystem Al Viro
@ 2025-08-28 23:08     ` Al Viro
  2025-08-28 23:08     ` [PATCH v2 61/63] struct mount: relocate MNT_WRITE_HOLD bit Al Viro
                       ` (2 subsequent siblings)
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

We have an unpleasant wart in accessibility rules for struct mount.  There
are per-superblock lists of mounts, used by sb_prepare_remount_readonly()
to check if any of those is currently claimed for write access and to
block further attempts to get write access on those until we are done.

As soon as it is attached to a filesystem, mount becomes reachable
via that list.  Only sb_prepare_remount_readonly() traverses it and
it only accesses a few members of struct mount.  Unfortunately,
->mnt_flags is one of those and it is modified - MNT_WRITE_HOLD set
and then cleared.  It is done under mount_lock, so from the locking
rules POV everything's fine.

However, it has easily overlooked implications - once mount has been
attached to a filesystem, it has to be treated as globally visible.
In particular, initializing ->mnt_flags *must* be done either prior
to that point or under mount_lock.  All other members are still
private at that point.

Life gets simpler if we move that bit (and that's *all* that can get
touched by access via this list) out of ->mnt_flags.  It's not even
hard to do - currently the list is implemented as list_head one,
anchored in super_block->s_mounts and linked via mount->mnt_instance.

As the first step, switch it to hlist-like open-coded structure -
address of the first mount in the set is stored in ->s_mounts
and ->mnt_instance replaced with ->mnt_next_for_sb and ->mnt_pprev_for_sb -
the former either NULL or pointing to the next mount in set, the
latter - address of either ->s_mounts or ->mnt_next_for_sb in the
previous element of the set.

In the next commit we'll steal the LSB of ->mnt_pprev_for_sb as
replacement for MNT_WRITE_HOLD.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/mount.h         |  3 ++-
 fs/namespace.c     | 38 +++++++++++++++++++++++++++++---------
 fs/super.c         |  3 +--
 include/linux/fs.h |  4 +++-
 4 files changed, 35 insertions(+), 13 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index 04d0eadc4c10..5c2ddcff810c 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -64,7 +64,8 @@ struct mount {
 #endif
 	struct list_head mnt_mounts;	/* list of children, anchored here */
 	struct list_head mnt_child;	/* and going through their mnt_child */
-	struct list_head mnt_instance;	/* mount instance on sb->s_mounts */
+	struct mount *mnt_next_for_sb;	/* the next two fields are hlist_node, */
+	struct mount **mnt_pprev_for_sb;/* except that LSB of pprev will be stolen */
 	const char *mnt_devname;	/* Name of device e.g. /dev/dsk/hda1 */
 	struct list_head mnt_list;
 	struct list_head mnt_expire;	/* link in fs-specific expiry list */
diff --git a/fs/namespace.c b/fs/namespace.c
index 5af609ff43bc..120854639dd2 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -729,6 +729,27 @@ static inline void mnt_unhold_writers(struct mount *mnt)
 	mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
 }
 
+static inline void mnt_del_instance(struct mount *m)
+{
+	struct mount **p = m->mnt_pprev_for_sb;
+	struct mount *next = m->mnt_next_for_sb;
+
+	if (next)
+		next->mnt_pprev_for_sb = p;
+	*p = next;
+}
+
+static inline void mnt_add_instance(struct mount *m, struct super_block *s)
+{
+	struct mount *first = s->s_mounts;
+
+	if (first)
+		first->mnt_pprev_for_sb = &m->mnt_next_for_sb;
+	m->mnt_next_for_sb = first;
+	m->mnt_pprev_for_sb = &s->s_mounts;
+	s->s_mounts = m;
+}
+
 static int mnt_make_readonly(struct mount *mnt)
 {
 	int ret;
@@ -742,7 +763,6 @@ static int mnt_make_readonly(struct mount *mnt)
 
 int sb_prepare_remount_readonly(struct super_block *sb)
 {
-	struct mount *mnt;
 	int err = 0;
 
 	/* Racy optimization.  Recheck the counter under MNT_WRITE_HOLD */
@@ -750,9 +770,9 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 		return -EBUSY;
 
 	lock_mount_hash();
-	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
-		if (!(mnt->mnt.mnt_flags & MNT_READONLY)) {
-			err = mnt_hold_writers(mnt);
+	for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {
+		if (!(m->mnt.mnt_flags & MNT_READONLY)) {
+			err = mnt_hold_writers(m);
 			if (err)
 				break;
 		}
@@ -762,9 +782,9 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 
 	if (!err)
 		sb_start_ro_state_change(sb);
-	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
-		if (mnt->mnt.mnt_flags & MNT_WRITE_HOLD)
-			mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
+	for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {
+		if (m->mnt.mnt_flags & MNT_WRITE_HOLD)
+			m->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
 	}
 	unlock_mount_hash();
 
@@ -1206,7 +1226,7 @@ static void setup_mnt(struct mount *m, struct dentry *root)
 	m->mnt_parent = m;
 
 	lock_mount_hash();
-	list_add_tail(&m->mnt_instance, &s->s_mounts);
+	mnt_add_instance(m, s);
 	unlock_mount_hash();
 }
 
@@ -1424,7 +1444,7 @@ static void mntput_no_expire(struct mount *mnt)
 	mnt->mnt.mnt_flags |= MNT_DOOMED;
 	rcu_read_unlock();
 
-	list_del(&mnt->mnt_instance);
+	mnt_del_instance(mnt);
 	if (unlikely(!list_empty(&mnt->mnt_expire)))
 		list_del(&mnt->mnt_expire);
 
diff --git a/fs/super.c b/fs/super.c
index 7f876f32343a..3b0f49e1b817 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -323,7 +323,6 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	if (!s)
 		return NULL;
 
-	INIT_LIST_HEAD(&s->s_mounts);
 	s->s_user_ns = get_user_ns(user_ns);
 	init_rwsem(&s->s_umount);
 	lockdep_set_class(&s->s_umount, &type->s_umount_key);
@@ -408,7 +407,7 @@ static void __put_super(struct super_block *s)
 		list_del_init(&s->s_list);
 		WARN_ON(s->s_dentry_lru.node);
 		WARN_ON(s->s_inode_lru.node);
-		WARN_ON(!list_empty(&s->s_mounts));
+		WARN_ON(s->s_mounts);
 		call_rcu(&s->rcu, destroy_super_rcu);
 	}
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d7ab4f96d705..0e9c7f1460dc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1324,6 +1324,8 @@ struct sb_writers {
 	struct percpu_rw_semaphore	rw_sem[SB_FREEZE_LEVELS];
 };
 
+struct mount;
+
 struct super_block {
 	struct list_head	s_list;		/* Keep this first */
 	dev_t			s_dev;		/* search index; _not_ kdev_t */
@@ -1358,7 +1360,7 @@ struct super_block {
 	__u16 s_encoding_flags;
 #endif
 	struct hlist_bl_head	s_roots;	/* alternate root dentries for NFS */
-	struct list_head	s_mounts;	/* list of mounts; _not_ for fs use */
+	struct mount		*s_mounts;	/* list of mounts; _not_ for fs use */
 	struct block_device	*s_bdev;	/* can go away once we use an accessor for @s_bdev_file */
 	struct file		*s_bdev_file;
 	struct backing_dev_info *s_bdi;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 61/63] struct mount: relocate MNT_WRITE_HOLD bit
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (58 preceding siblings ...)
  2025-08-28 23:08     ` [PATCH v2 60/63] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags Al Viro
@ 2025-08-28 23:08     ` Al Viro
  2025-08-28 23:31       ` Linus Torvalds
  2025-08-28 23:08     ` [PATCH v2 62/63] simplify the callers of mnt_unhold_writers() Al Viro
  2025-08-28 23:08     ` [PATCH v2 63/63] WRITE_HOLD machinery: no need for to bump mount_lock seqcount Al Viro
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... from ->mnt_flags to LSB of ->mnt_pprev_for_sb.

This is safe - we always set and clear it within the same mount_lock
scope, so we won't interfere with list operations - traversals are
always forward, so they don't even look at ->mnt_prev_for_sb and
both insertions and removals are in mount_lock scopes of their own,
so that bit will be clear in *all* mount instances during those.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/mount.h            |  3 ++-
 fs/namespace.c        | 50 +++++++++++++++++++++----------------------
 include/linux/fs.h    |  4 +---
 include/linux/mount.h |  3 +--
 4 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index 5c2ddcff810c..c13bbd93d837 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -65,7 +65,8 @@ struct mount {
 	struct list_head mnt_mounts;	/* list of children, anchored here */
 	struct list_head mnt_child;	/* and going through their mnt_child */
 	struct mount *mnt_next_for_sb;	/* the next two fields are hlist_node, */
-	struct mount **mnt_pprev_for_sb;/* except that LSB of pprev will be stolen */
+	unsigned long mnt_pprev_for_sb;	/* except that LSB of pprev is stolen */
+#define WRITE_HOLD 1			/* ... for use by mnt_hold_writers() */
 	const char *mnt_devname;	/* Name of device e.g. /dev/dsk/hda1 */
 	struct list_head mnt_list;
 	struct list_head mnt_expire;	/* link in fs-specific expiry list */
diff --git a/fs/namespace.c b/fs/namespace.c
index 120854639dd2..f9c9c69a815b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -509,20 +509,20 @@ int mnt_get_write_access(struct vfsmount *m)
 	mnt_inc_writers(mnt);
 	/*
 	 * The store to mnt_inc_writers must be visible before we pass
-	 * MNT_WRITE_HOLD loop below, so that the slowpath can see our
-	 * incremented count after it has set MNT_WRITE_HOLD.
+	 * WRITE_HOLD loop below, so that the slowpath can see our
+	 * incremented count after it has set WRITE_HOLD.
 	 */
 	smp_mb();
 	might_lock(&mount_lock.lock);
-	while (READ_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD) {
+	while (READ_ONCE(mnt->mnt_pprev_for_sb) & WRITE_HOLD) {
 		if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
 			cpu_relax();
 		} else {
 			/*
 			 * This prevents priority inversion, if the task
-			 * setting MNT_WRITE_HOLD got preempted on a remote
+			 * setting WRITE_HOLD got preempted on a remote
 			 * CPU, and it prevents life lock if the task setting
-			 * MNT_WRITE_HOLD has a lower priority and is bound to
+			 * WRITE_HOLD has a lower priority and is bound to
 			 * the same CPU as the task that is spinning here.
 			 */
 			preempt_enable();
@@ -533,7 +533,7 @@ int mnt_get_write_access(struct vfsmount *m)
 	}
 	/*
 	 * The barrier pairs with the barrier sb_start_ro_state_change() making
-	 * sure that if we see MNT_WRITE_HOLD cleared, we will also see
+	 * sure that if we see WRITE_HOLD cleared, we will also see
 	 * s_readonly_remount set (or even SB_RDONLY / MNT_READONLY flags) in
 	 * mnt_is_readonly() and bail in case we are racing with remount
 	 * read-only.
@@ -672,15 +672,15 @@ EXPORT_SYMBOL(mnt_drop_write_file);
  * @mnt.
  *
  * Context: This function expects lock_mount_hash() to be held serializing
- *          setting MNT_WRITE_HOLD.
+ *          setting WRITE_HOLD.
  * Return: On success 0 is returned.
  *	   On error, -EBUSY is returned.
  */
 static inline int mnt_hold_writers(struct mount *mnt)
 {
-	mnt->mnt.mnt_flags |= MNT_WRITE_HOLD;
+	mnt->mnt_pprev_for_sb |= WRITE_HOLD;
 	/*
-	 * After storing MNT_WRITE_HOLD, we'll read the counters. This store
+	 * After storing WRITE_HOLD, we'll read the counters. This store
 	 * should be visible before we do.
 	 */
 	smp_mb();
@@ -696,9 +696,9 @@ static inline int mnt_hold_writers(struct mount *mnt)
 	 * sum up each counter, if we read a counter before it is incremented,
 	 * but then read another CPU's count which it has been subsequently
 	 * decremented from -- we would see more decrements than we should.
-	 * MNT_WRITE_HOLD protects against this scenario, because
+	 * WRITE_HOLD protects against this scenario, because
 	 * mnt_want_write first increments count, then smp_mb, then spins on
-	 * MNT_WRITE_HOLD, so it can't be decremented by another CPU while
+	 * WRITE_HOLD, so it can't be decremented by another CPU while
 	 * we're counting up here.
 	 */
 	if (mnt_get_writers(mnt) > 0)
@@ -722,20 +722,20 @@ static inline int mnt_hold_writers(struct mount *mnt)
 static inline void mnt_unhold_writers(struct mount *mnt)
 {
 	/*
-	 * MNT_READONLY must become visible before ~MNT_WRITE_HOLD, so writers
+	 * MNT_READONLY must become visible before ~WRITE_HOLD, so writers
 	 * that become unheld will see MNT_READONLY.
 	 */
 	smp_wmb();
-	mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
+	mnt->mnt_pprev_for_sb &= ~WRITE_HOLD;
 }
 
 static inline void mnt_del_instance(struct mount *m)
 {
-	struct mount **p = m->mnt_pprev_for_sb;
+	struct mount **p = (void *)m->mnt_pprev_for_sb;
 	struct mount *next = m->mnt_next_for_sb;
 
 	if (next)
-		next->mnt_pprev_for_sb = p;
+		next->mnt_pprev_for_sb = (unsigned long)p;
 	*p = next;
 }
 
@@ -744,9 +744,9 @@ static inline void mnt_add_instance(struct mount *m, struct super_block *s)
 	struct mount *first = s->s_mounts;
 
 	if (first)
-		first->mnt_pprev_for_sb = &m->mnt_next_for_sb;
+		first->mnt_pprev_for_sb = (unsigned long)&m->mnt_next_for_sb;
 	m->mnt_next_for_sb = first;
-	m->mnt_pprev_for_sb = &s->s_mounts;
+	m->mnt_pprev_for_sb = (unsigned long)&s->s_mounts;
 	s->s_mounts = m;
 }
 
@@ -765,7 +765,7 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 {
 	int err = 0;
 
-	/* Racy optimization.  Recheck the counter under MNT_WRITE_HOLD */
+	/* Racy optimization.  Recheck the counter under WRITE_HOLD */
 	if (atomic_long_read(&sb->s_remove_count))
 		return -EBUSY;
 
@@ -783,8 +783,8 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 	if (!err)
 		sb_start_ro_state_change(sb);
 	for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {
-		if (m->mnt.mnt_flags & MNT_WRITE_HOLD)
-			m->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
+		if (m->mnt_pprev_for_sb & WRITE_HOLD)
+			m->mnt_pprev_for_sb &= ~WRITE_HOLD;
 	}
 	unlock_mount_hash();
 
@@ -4805,18 +4805,18 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
 		struct mount *p;
 
 		/*
-		 * If we had to call mnt_hold_writers() MNT_WRITE_HOLD will
-		 * be set in @mnt_flags. The loop unsets MNT_WRITE_HOLD for all
+		 * If we had to call mnt_hold_writers() WRITE_HOLD will
+		 * be set in @mnt_flags. The loop unsets WRITE_HOLD for all
 		 * mounts and needs to take care to include the first mount.
 		 */
 		for (p = mnt; p; p = next_mnt(p, mnt)) {
 			/* If we had to hold writers unblock them. */
-			if (p->mnt.mnt_flags & MNT_WRITE_HOLD)
+			if (p->mnt_pprev_for_sb & WRITE_HOLD)
 				mnt_unhold_writers(p);
 
 			/*
 			 * We're done once the first mount we changed got
-			 * MNT_WRITE_HOLD unset.
+			 * WRITE_HOLD unset.
 			 */
 			if (p == m)
 				break;
@@ -4851,7 +4851,7 @@ static void mount_setattr_commit(struct mount_kattr *kattr, struct mount *mnt)
 		WRITE_ONCE(m->mnt.mnt_flags, flags);
 
 		/* If we had to hold writers unblock them. */
-		if (m->mnt.mnt_flags & MNT_WRITE_HOLD)
+		if (mnt->mnt_pprev_for_sb & WRITE_HOLD)
 			mnt_unhold_writers(m);
 
 		if (kattr->propagation)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0e9c7f1460dc..1d583f38fb81 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1324,8 +1324,6 @@ struct sb_writers {
 	struct percpu_rw_semaphore	rw_sem[SB_FREEZE_LEVELS];
 };
 
-struct mount;
-
 struct super_block {
 	struct list_head	s_list;		/* Keep this first */
 	dev_t			s_dev;		/* search index; _not_ kdev_t */
@@ -1360,7 +1358,7 @@ struct super_block {
 	__u16 s_encoding_flags;
 #endif
 	struct hlist_bl_head	s_roots;	/* alternate root dentries for NFS */
-	struct mount		*s_mounts;	/* list of mounts; _not_ for fs use */
+	void			*s_mounts;	/* list of mounts; _not_ for fs use */
 	struct block_device	*s_bdev;	/* can go away once we use an accessor for @s_bdev_file */
 	struct file		*s_bdev_file;
 	struct backing_dev_info *s_bdi;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 18e4b97f8a98..85e97b9340ff 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -33,7 +33,6 @@ enum mount_flags {
 	MNT_NOSYMFOLLOW	= 0x80,
 
 	MNT_SHRINKABLE	= 0x100,
-	MNT_WRITE_HOLD	= 0x200,
 
 	MNT_INTERNAL	= 0x4000,
 
@@ -52,7 +51,7 @@ enum mount_flags {
 				  | MNT_READONLY | MNT_NOSYMFOLLOW,
 	MNT_ATIME_MASK = MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME,
 
-	MNT_INTERNAL_FLAGS = MNT_WRITE_HOLD | MNT_INTERNAL | MNT_DOOMED |
+	MNT_INTERNAL_FLAGS = MNT_INTERNAL | MNT_DOOMED |
 			     MNT_SYNC_UMOUNT | MNT_LOCKED
 };
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 62/63] simplify the callers of mnt_unhold_writers()
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (59 preceding siblings ...)
  2025-08-28 23:08     ` [PATCH v2 61/63] struct mount: relocate MNT_WRITE_HOLD bit Al Viro
@ 2025-08-28 23:08     ` Al Viro
  2025-08-28 23:08     ` [PATCH v2 63/63] WRITE_HOLD machinery: no need for to bump mount_lock seqcount Al Viro
  61 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

The logics in cleanup on failure in mount_setattr_prepare() is simplified
by having the mnt_hold_writers() failure followed by advancing m to the
next node in the tree before leaving the loop.

And since all calls are preceded by the same check that flag has been set
and the function is inlined, let's just shift the check into it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 34 ++++++++++------------------------
 1 file changed, 10 insertions(+), 24 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index f9c9c69a815b..6b439e5e5a27 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -714,13 +714,14 @@ static inline int mnt_hold_writers(struct mount *mnt)
  * Stop preventing write access to @mnt allowing callers to gain write access
  * to @mnt again.
  *
- * This function can only be called after a successful call to
- * mnt_hold_writers().
+ * This function can only be called after a call to mnt_hold_writers().
  *
  * Context: This function expects lock_mount_hash() to be held.
  */
 static inline void mnt_unhold_writers(struct mount *mnt)
 {
+	if (!(mnt->mnt_pprev_for_sb & WRITE_HOLD))
+		return;
 	/*
 	 * MNT_READONLY must become visible before ~WRITE_HOLD, so writers
 	 * that become unheld will see MNT_READONLY.
@@ -4793,8 +4794,10 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
 
 		if (!mnt_allow_writers(kattr, m)) {
 			err = mnt_hold_writers(m);
-			if (err)
+			if (err) {
+				m = next_mnt(m, mnt);
 				break;
+			}
 		}
 
 		if (!(kattr->kflags & MOUNT_KATTR_RECURSE))
@@ -4802,25 +4805,9 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
 	}
 
 	if (err) {
-		struct mount *p;
-
-		/*
-		 * If we had to call mnt_hold_writers() WRITE_HOLD will
-		 * be set in @mnt_flags. The loop unsets WRITE_HOLD for all
-		 * mounts and needs to take care to include the first mount.
-		 */
-		for (p = mnt; p; p = next_mnt(p, mnt)) {
-			/* If we had to hold writers unblock them. */
-			if (p->mnt_pprev_for_sb & WRITE_HOLD)
-				mnt_unhold_writers(p);
-
-			/*
-			 * We're done once the first mount we changed got
-			 * WRITE_HOLD unset.
-			 */
-			if (p == m)
-				break;
-		}
+		/* undo all mnt_hold_writers() we'd done */
+		for (struct mount *p = mnt; p != m; p = next_mnt(p, mnt))
+			mnt_unhold_writers(p);
 	}
 	return err;
 }
@@ -4851,8 +4838,7 @@ static void mount_setattr_commit(struct mount_kattr *kattr, struct mount *mnt)
 		WRITE_ONCE(m->mnt.mnt_flags, flags);
 
 		/* If we had to hold writers unblock them. */
-		if (mnt->mnt_pprev_for_sb & WRITE_HOLD)
-			mnt_unhold_writers(m);
+		mnt_unhold_writers(m);
 
 		if (kattr->propagation)
 			change_mnt_propagation(m, kattr->propagation);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v2 63/63] WRITE_HOLD machinery: no need for to bump mount_lock seqcount
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (60 preceding siblings ...)
  2025-08-28 23:08     ` [PATCH v2 62/63] simplify the callers of mnt_unhold_writers() Al Viro
@ 2025-08-28 23:08     ` Al Viro
  2025-09-01 11:28       ` Christian Brauner
  61 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... neither for insertion into the list of instances, nor for
mnt_{un,}hold_writers(), nor for mnt_get_write_access() deciding
to be nice to RT during a busy-wait loop - all of that only needs
the spinlock side of mount_lock.

IOW, it's mount_locked_reader, not mount_writer.

Clarify the comment re locking rules for mnt_unhold_writers() - it's
not just that mount_lock needs to be held when calling that, it must
have been held all along since the matching mnt_hold_writers().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 6b439e5e5a27..545fef0682b1 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -526,8 +526,8 @@ int mnt_get_write_access(struct vfsmount *m)
 			 * the same CPU as the task that is spinning here.
 			 */
 			preempt_enable();
-			lock_mount_hash();
-			unlock_mount_hash();
+			read_seqlock_excl(&mount_lock);
+			read_sequnlock_excl(&mount_lock);
 			preempt_disable();
 		}
 	}
@@ -671,7 +671,7 @@ EXPORT_SYMBOL(mnt_drop_write_file);
  * a call to mnt_unhold_writers() in order to stop preventing write access to
  * @mnt.
  *
- * Context: This function expects lock_mount_hash() to be held serializing
+ * Context: This function expects to be in mount_locked_reader scope serializing
  *          setting WRITE_HOLD.
  * Return: On success 0 is returned.
  *	   On error, -EBUSY is returned.
@@ -716,7 +716,8 @@ static inline int mnt_hold_writers(struct mount *mnt)
  *
  * This function can only be called after a call to mnt_hold_writers().
  *
- * Context: This function expects lock_mount_hash() to be held.
+ * Context: This function expects to be in the same mount_locked_reader scope
+ * as the matching mnt_hold_writers().
  */
 static inline void mnt_unhold_writers(struct mount *mnt)
 {
@@ -770,7 +771,8 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 	if (atomic_long_read(&sb->s_remove_count))
 		return -EBUSY;
 
-	lock_mount_hash();
+	guard(mount_locked_reader)();
+
 	for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {
 		if (!(m->mnt.mnt_flags & MNT_READONLY)) {
 			err = mnt_hold_writers(m);
@@ -787,7 +789,6 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 		if (m->mnt_pprev_for_sb & WRITE_HOLD)
 			m->mnt_pprev_for_sb &= ~WRITE_HOLD;
 	}
-	unlock_mount_hash();
 
 	return err;
 }
@@ -1226,9 +1227,8 @@ static void setup_mnt(struct mount *m, struct dentry *root)
 	m->mnt_mountpoint = m->mnt.mnt_root;
 	m->mnt_parent = m;
 
-	lock_mount_hash();
+	guard(mount_locked_reader)();
 	mnt_add_instance(m, s);
-	unlock_mount_hash();
 }
 
 /**
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 33/63] don't bother passing new_path->dentry to can_move_mount_beneath()
  2025-08-28 23:07     ` [PATCH v2 33/63] don't bother passing new_path->dentry to can_move_mount_beneath() Al Viro
@ 2025-08-28 23:20       ` Linus Torvalds
  2025-08-28 23:39         ` Al Viro
  0 siblings, 1 reply; 320+ messages in thread
From: Linus Torvalds @ 2025-08-28 23:20 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, brauner, jack

On Thu, 28 Aug 2025 at 16:08, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>         if (beneath) {
> -               err = can_move_mount_beneath(old, new_path, mp.mp);
> +               err = can_move_mount_beneath(old, real_mount(new_path->mnt), mp.mp);
>                 if (err)
>                         return err;
>         }

Going through the patches, this is one that I think made things
uglier... Most of them make me go "nice simplification".

(I'll have a separate comment on 61/63)

I certainly agree with the intent of the patch, but that
can_move_mount_beneath() call line is now rather hard to read. It
looked simpler before.

Maybe you could just split it into two lines, and write it as

        if (beneath) {
                struct mount *new_mnt = real_mount(new_path->mnt);
                err = can_move_mount_beneath(old, new_mnt, mp.mp);
                if (err)
                        return err;
        }

which makes slightly less happen in that one line (and it fits in 80
columns too - not a requirement, but still "good taste")

Long lines are better than randomly splitting lines unreadably into
multiple lines, but short lines that are logically split are still
preferred, I would say..

            Linus

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 61/63] struct mount: relocate MNT_WRITE_HOLD bit
  2025-08-28 23:08     ` [PATCH v2 61/63] struct mount: relocate MNT_WRITE_HOLD bit Al Viro
@ 2025-08-28 23:31       ` Linus Torvalds
  2025-08-29  0:11         ` Al Viro
  0 siblings, 1 reply; 320+ messages in thread
From: Linus Torvalds @ 2025-08-28 23:31 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, brauner, jack

On Thu, 28 Aug 2025 at 16:08, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> ... from ->mnt_flags to LSB of ->mnt_pprev_for_sb.

Ugh. This one I'm not happy with.

The random new casts:

>  static inline void mnt_del_instance(struct mount *m)
>  {
> -       struct mount **p = m->mnt_pprev_for_sb;
> +       struct mount **p = (void *)m->mnt_pprev_for_sb;
>         struct mount *next = m->mnt_next_for_sb;
>
>         if (next)
> -               next->mnt_pprev_for_sb = p;
> +               next->mnt_pprev_for_sb = (unsigned long)p;
>         *p = next;
>  }

are just nasty. And it's there in multiple places (ie
mnt_add_instance() has more of them).

Making things even *worse*, the other case you changed (s_mounts) it's
a "void *", which means that it does *not* have casts in other places,
and you still do things like

        for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {

so that 's_mounts' thing is just silently cast from a untyped 'void *'
to the 'struct mount *' that it used to be.

So no - this is *not* acceptable.

Same largely goes for that

> -       struct mount **mnt_pprev_for_sb;/* except that LSB of pprev will be stolen */
> +       unsigned long mnt_pprev_for_sb; /* except that LSB of pprev is stolen */

change, but at least there it's now a 'unsigned long', so it will
*always* complain if a cast is missing in either direction. That's
better, but still horrendously ugly.

If you want to use an opaque type, then please make it be truly
opaque. Not 'unsigned long'. And certainly not 'void *'. Make it be
something that is still type-safe - you can make up a pointer to
struct name that is never actually declared, so that it's basically a
unique type (or two separate types for mnt_pprev_for_sb and

I'm not even clear on why you did this change, but if you want to have
specific types for some reason, make them *really* specific. Don't
make them 'void *', and 'unsigned long'.

            Linus

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 33/63] don't bother passing new_path->dentry to can_move_mount_beneath()
  2025-08-28 23:20       ` Linus Torvalds
@ 2025-08-28 23:39         ` Al Viro
  0 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-28 23:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, brauner, jack

On Thu, Aug 28, 2025 at 04:20:56PM -0700, Linus Torvalds wrote:

> (I'll have a separate comment on 61/63)
> 
> I certainly agree with the intent of the patch, but that
> can_move_mount_beneath() call line is now rather hard to read. It
> looked simpler before.
> 
> Maybe you could just split it into two lines, and write it as
> 
>         if (beneath) {
>                 struct mount *new_mnt = real_mount(new_path->mnt);
>                 err = can_move_mount_beneath(old, new_mnt, mp.mp);
>                 if (err)
>                         return err;
>         }
> 
> which makes slightly less happen in that one line (and it fits in 80
> columns too - not a requirement, but still "good taste")
> 
> Long lines are better than randomly splitting lines unreadably into
> multiple lines, but short lines that are logically split are still
> preferred, I would say..

FWIW, if you look at #35, you'll see this:

-		err = can_move_mount_beneath(old, real_mount(new_path->mnt), mp.mp);
+		struct mount *over = real_mount(new_path->mnt);
+
+		if (mp.parent != over->mnt_parent)
+			over = mp.parent->overmount;
+		err = can_move_mount_beneath(old, over, mp.mp);

So... might as well introduce the variable in this one.
Then this chunk becomes
@@ -3618,7 +3617,9 @@ static int do_move_mount(struct path *old_path,
 	}
 
 	if (beneath) {
-		err = can_move_mount_beneath(old, new_path, mp.mp);
+		struct mount *over = real_mount(new_path->mnt);
+
+		err = can_move_mount_beneath(old, over, mp.mp);
 		if (err)
 			return err;
 	}
and the corresponding one in #35 
@@ -3618,6 +3624,8 @@ static int do_move_mount(struct path *old_path,
 	if (beneath) {
 		struct mount *over = real_mount(new_path->mnt);
 
+		if (mp.parent != over->mnt_parent)
+			over = mp.parent->overmount;
 		err = can_move_mount_beneath(old, over, mp.mp);
 		if (err)
 			return err;

OK, done - both certainly look better that way.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 61/63] struct mount: relocate MNT_WRITE_HOLD bit
  2025-08-28 23:31       ` Linus Torvalds
@ 2025-08-29  0:11         ` Al Viro
  2025-08-29  0:35           ` Linus Torvalds
  0 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-29  0:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, brauner, jack

On Thu, Aug 28, 2025 at 04:31:56PM -0700, Linus Torvalds wrote:

> Same largely goes for that
> 
> > -       struct mount **mnt_pprev_for_sb;/* except that LSB of pprev will be stolen */
> > +       unsigned long mnt_pprev_for_sb; /* except that LSB of pprev is stolen */
> 
> change, but at least there it's now a 'unsigned long', so it will
> *always* complain if a cast is missing in either direction. That's
> better, but still horrendously ugly.
> 
> If you want to use an opaque type, then please make it be truly
> opaque. Not 'unsigned long'. And certainly not 'void *'. Make it be
> something that is still type-safe - you can make up a pointer to
> struct name that is never actually declared, so that it's basically a
> unique type (or two separate types for mnt_pprev_for_sb and
> 
> I'm not even clear on why you did this change, but if you want to have
> specific types for some reason, make them *really* specific. Don't
> make them 'void *', and 'unsigned long'.

What I want to avoid is compiler seeing something like
	(unsigned long)READ_ONCE(m->mnt_pprev_for_sb) & 1
and going "that thing is a pointer to struct mount *, either the address
is even or it's an undefined behaviour and I can do whatever I want
anyway; optimize it to 0".

unsigned long is a brute-force way to avoid that - it avoids UB (OK, avoids
it as long as no struct mount instance has an odd address), so compiler can't
start playing silly buggers.

If you have a prettier approach, I'd like to hear it - I obviously do not
enjoy the way this one looks.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 61/63] struct mount: relocate MNT_WRITE_HOLD bit
  2025-08-29  0:11         ` Al Viro
@ 2025-08-29  0:35           ` Linus Torvalds
  2025-08-29  6:03             ` Al Viro
  0 siblings, 1 reply; 320+ messages in thread
From: Linus Torvalds @ 2025-08-29  0:35 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, brauner, jack

On Thu, 28 Aug 2025 at 17:11, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> What I want to avoid is compiler seeing something like
>         (unsigned long)READ_ONCE(m->mnt_pprev_for_sb) & 1
> and going "that thing is a pointer to struct mount *, either the address
> is even or it's an undefined behaviour and I can do whatever I want
> anyway; optimize it to 0".

Have you actually seen that? Because if some compiler does this, we
have tons of other places that will hit this, and we'll need to try to
figure out some generic solution, or - more likely - just disable said
compiler "optimization".

And if you really want to deal with this theoretical issue, please
just use a union for it, having both the proper pointer type and the
'unsigned long', and using the appropriate field instead of any type
casts.

                  Linus

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 61/63] struct mount: relocate MNT_WRITE_HOLD bit
  2025-08-29  0:35           ` Linus Torvalds
@ 2025-08-29  6:03             ` Al Viro
  2025-08-29  6:04               ` [59/63] simplify the callers of mnt_unhold_writers() Al Viro
                                 ` (3 more replies)
  0 siblings, 4 replies; 320+ messages in thread
From: Al Viro @ 2025-08-29  6:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, brauner, jack

On Thu, Aug 28, 2025 at 05:35:26PM -0700, Linus Torvalds wrote:
> On Thu, 28 Aug 2025 at 17:11, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > What I want to avoid is compiler seeing something like
> >         (unsigned long)READ_ONCE(m->mnt_pprev_for_sb) & 1
> > and going "that thing is a pointer to struct mount *, either the address
> > is even or it's an undefined behaviour and I can do whatever I want
> > anyway; optimize it to 0".
> 
> Have you actually seen that? Because if some compiler does this, we
> have tons of other places that will hit this, and we'll need to try to
> figure out some generic solution, or - more likely - just disable said
> compiler "optimization".

list_bl.h being an obvious victim...  OK, convinced.  No, I hadn't seen
that, and I agree that we'll get some very visible breakage if that ever
happens.

Anyway, I think I've come up with a trick that would be proof against
that kind of idiocy:
	struct mount *__aligned(1) *mnt_pprev_for_sb;

IOW, tell compiler that this member contains a pointer to a possibly
unaligned object containing a pointer to struct mount.

Since the member is declared as pointer to unaligned object, compiler
is not allowed to make any assumptions about the LSB of its value.

For any type T, we are fine with
	T __aligned(1) *p;
	...
	T *q = p;
and as long as the value of p is actually aligned, no nasal daemons should
fly.

Since we never dereference them directly ('add' doesn't dereference them
at all, 'del' copies to local struct mount ** and dereferences that), all
generated memory accesses will be aligned ones.

Since the only values we'll ever assign to that member will be addresses
of normally aligned objects, we should be fine.

Sure, __attribute__((__aligned__(...))) is not standard, but AFAICS we
should not step into any UB in a compiler implementing it...

Replacements for the 59..62 in followups (I've reordered them - easier
that way).

^ permalink raw reply	[flat|nested] 320+ messages in thread

* [59/63] simplify the callers of mnt_unhold_writers()
  2025-08-29  6:03             ` Al Viro
@ 2025-08-29  6:04               ` Al Viro
  2025-09-01 11:20                 ` Christian Brauner
  2025-08-29  6:05               ` [60/63] setup_mnt(): primitive for connecting a mount to filesystem Al Viro
                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-29  6:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, brauner, jack

The logics in cleanup on failure in mount_setattr_prepare() is simplified
by having the mnt_hold_writers() failure followed by advancing m to the
next node in the tree before leaving the loop.

And since all calls are preceded by the same check that flag has been set
and the function is inlined, let's just shift the check into it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 34 ++++++++++------------------------
 1 file changed, 10 insertions(+), 24 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 9e16231d4561..d8df1046e2f9 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -714,13 +714,14 @@ static inline int mnt_hold_writers(struct mount *mnt)
  * Stop preventing write access to @mnt allowing callers to gain write access
  * to @mnt again.
  *
- * This function can only be called after a successful call to
- * mnt_hold_writers().
+ * This function can only be called after a call to mnt_hold_writers().
  *
  * Context: This function expects lock_mount_hash() to be held.
  */
 static inline void mnt_unhold_writers(struct mount *mnt)
 {
+	if (!(mnt->mnt_flags & MNT_WRITE_HOLD))
+		return;
 	/*
 	 * MNT_READONLY must become visible before ~MNT_WRITE_HOLD, so writers
 	 * that become unheld will see MNT_READONLY.
@@ -4773,8 +4774,10 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
 
 		if (!mnt_allow_writers(kattr, m)) {
 			err = mnt_hold_writers(m);
-			if (err)
+			if (err) {
+				m = next_mnt(m, mnt);
 				break;
+			}
 		}
 
 		if (!(kattr->kflags & MOUNT_KATTR_RECURSE))
@@ -4782,25 +4785,9 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
 	}
 
 	if (err) {
-		struct mount *p;
-
-		/*
-		 * If we had to call mnt_hold_writers() MNT_WRITE_HOLD will
-		 * be set in @mnt_flags. The loop unsets MNT_WRITE_HOLD for all
-		 * mounts and needs to take care to include the first mount.
-		 */
-		for (p = mnt; p; p = next_mnt(p, mnt)) {
-			/* If we had to hold writers unblock them. */
-			if (p->mnt.mnt_flags & MNT_WRITE_HOLD)
-				mnt_unhold_writers(p);
-
-			/*
-			 * We're done once the first mount we changed got
-			 * MNT_WRITE_HOLD unset.
-			 */
-			if (p == m)
-				break;
-		}
+		/* undo all mnt_hold_writers() we'd done */
+		for (struct mount *p = mnt; p != m; p = next_mnt(p, mnt))
+			mnt_unhold_writers(p);
 	}
 	return err;
 }
@@ -4831,8 +4818,7 @@ static void mount_setattr_commit(struct mount_kattr *kattr, struct mount *mnt)
 		WRITE_ONCE(m->mnt.mnt_flags, flags);
 
 		/* If we had to hold writers unblock them. */
-		if (m->mnt.mnt_flags & MNT_WRITE_HOLD)
-			mnt_unhold_writers(m);
+		mnt_unhold_writers(m);
 
 		if (kattr->propagation)
 			change_mnt_propagation(m, kattr->propagation);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [60/63] setup_mnt(): primitive for connecting a mount to filesystem
  2025-08-29  6:03             ` Al Viro
  2025-08-29  6:04               ` [59/63] simplify the callers of mnt_unhold_writers() Al Viro
@ 2025-08-29  6:05               ` Al Viro
  2025-08-29  9:59                 ` Christian Brauner
  2025-09-01 11:17                 ` [60/63] setup_mnt(): primitive for connecting a mount to filesystem Christian Brauner
  2025-08-29  6:06               ` [61/63] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags Al Viro
  2025-08-29  6:07               ` [62/63] struct mount: relocate MNT_WRITE_HOLD bit Al Viro
  3 siblings, 2 replies; 320+ messages in thread
From: Al Viro @ 2025-08-29  6:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, brauner, jack

Take the identical logics in vfs_create_mount() and clone_mnt() into
a new helper that takes an empty struct mount and attaches it to
given dentry (sub)tree.

Should be called once in the lifetime of every mount, prior to making
it visible in any data structures.

After that point ->mnt_root and ->mnt_sb never change; ->mnt_root
is a counting reference to dentry and ->mnt_sb - an active reference
to superblock.

Mount remains associated with that dentry tree all the way until
the call of cleanup_mnt(), when the refcount eventually drops
to zero.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index d8df1046e2f9..c769fc4051e0 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1196,6 +1196,21 @@ static void commit_tree(struct mount *mnt)
 	touch_mnt_namespace(n);
 }
 
+static void setup_mnt(struct mount *m, struct dentry *root)
+{
+	struct super_block *s = root->d_sb;
+
+	atomic_inc(&s->s_active);
+	m->mnt.mnt_sb = s;
+	m->mnt.mnt_root = dget(root);
+	m->mnt_mountpoint = m->mnt.mnt_root;
+	m->mnt_parent = m;
+
+	lock_mount_hash();
+	list_add_tail(&m->mnt_instance, &s->s_mounts);
+	unlock_mount_hash();
+}
+
 /**
  * vfs_create_mount - Create a mount for a configured superblock
  * @fc: The configuration context with the superblock attached
@@ -1219,15 +1234,8 @@ struct vfsmount *vfs_create_mount(struct fs_context *fc)
 	if (fc->sb_flags & SB_KERNMOUNT)
 		mnt->mnt.mnt_flags = MNT_INTERNAL;
 
-	atomic_inc(&fc->root->d_sb->s_active);
-	mnt->mnt.mnt_sb		= fc->root->d_sb;
-	mnt->mnt.mnt_root	= dget(fc->root);
-	mnt->mnt_mountpoint	= mnt->mnt.mnt_root;
-	mnt->mnt_parent		= mnt;
+	setup_mnt(mnt, fc->root);
 
-	lock_mount_hash();
-	list_add_tail(&mnt->mnt_instance, &mnt->mnt.mnt_sb->s_mounts);
-	unlock_mount_hash();
 	return &mnt->mnt;
 }
 EXPORT_SYMBOL(vfs_create_mount);
@@ -1285,7 +1293,6 @@ EXPORT_SYMBOL_GPL(vfs_kern_mount);
 static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 					int flag)
 {
-	struct super_block *sb = old->mnt.mnt_sb;
 	struct mount *mnt;
 	int err;
 
@@ -1310,16 +1317,9 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 	if (mnt->mnt_group_id)
 		set_mnt_shared(mnt);
 
-	atomic_inc(&sb->s_active);
 	mnt->mnt.mnt_idmap = mnt_idmap_get(mnt_idmap(&old->mnt));
 
-	mnt->mnt.mnt_sb = sb;
-	mnt->mnt.mnt_root = dget(root);
-	mnt->mnt_mountpoint = mnt->mnt.mnt_root;
-	mnt->mnt_parent = mnt;
-	lock_mount_hash();
-	list_add_tail(&mnt->mnt_instance, &sb->s_mounts);
-	unlock_mount_hash();
+	setup_mnt(mnt, root);
 
 	if (flag & CL_PRIVATE)	// we are done with it
 		return mnt;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [61/63] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags
  2025-08-29  6:03             ` Al Viro
  2025-08-29  6:04               ` [59/63] simplify the callers of mnt_unhold_writers() Al Viro
  2025-08-29  6:05               ` [60/63] setup_mnt(): primitive for connecting a mount to filesystem Al Viro
@ 2025-08-29  6:06               ` Al Viro
  2025-09-01 11:27                 ` Christian Brauner
  2025-08-29  6:07               ` [62/63] struct mount: relocate MNT_WRITE_HOLD bit Al Viro
  3 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-29  6:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, brauner, jack

We have an unpleasant wart in accessibility rules for struct mount.  There
are per-superblock lists of mounts, used by sb_prepare_remount_readonly()
to check if any of those is currently claimed for write access and to
block further attempts to get write access on those until we are done.

As soon as it is attached to a filesystem, mount becomes reachable
via that list.  Only sb_prepare_remount_readonly() traverses it and
it only accesses a few members of struct mount.  Unfortunately,
->mnt_flags is one of those and it is modified - MNT_WRITE_HOLD set
and then cleared.  It is done under mount_lock, so from the locking
rules POV everything's fine.

However, it has easily overlooked implications - once mount has been
attached to a filesystem, it has to be treated as globally visible.
In particular, initializing ->mnt_flags *must* be done either prior
to that point or under mount_lock.  All other members are still
private at that point.

Life gets simpler if we move that bit (and that's *all* that can get
touched by access via this list) out of ->mnt_flags.  It's not even
hard to do - currently the list is implemented as list_head one,
anchored in super_block->s_mounts and linked via mount->mnt_instance.

As the first step, switch it to hlist-like open-coded structure -
address of the first mount in the set is stored in ->s_mounts
and ->mnt_instance replaced with ->mnt_next_for_sb and ->mnt_pprev_for_sb -
the former either NULL or pointing to the next mount in set, the
latter - address of either ->s_mounts or ->mnt_next_for_sb in the
previous element of the set.

In the next commit we'll steal the LSB of ->mnt_pprev_for_sb as
replacement for MNT_WRITE_HOLD.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/mount.h         |  4 +++-
 fs/namespace.c     | 38 +++++++++++++++++++++++++++++---------
 fs/super.c         |  3 +--
 include/linux/fs.h |  4 +++-
 4 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index 04d0eadc4c10..b208f69f69d7 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -64,7 +64,9 @@ struct mount {
 #endif
 	struct list_head mnt_mounts;	/* list of children, anchored here */
 	struct list_head mnt_child;	/* and going through their mnt_child */
-	struct list_head mnt_instance;	/* mount instance on sb->s_mounts */
+	struct mount *mnt_next_for_sb;	/* the next two fields are hlist_node, */
+	struct mount * __aligned(1) *mnt_pprev_for_sb;
+					/* except that LSB of pprev will be stolen */
 	const char *mnt_devname;	/* Name of device e.g. /dev/dsk/hda1 */
 	struct list_head mnt_list;
 	struct list_head mnt_expire;	/* link in fs-specific expiry list */
diff --git a/fs/namespace.c b/fs/namespace.c
index c769fc4051e0..06be5b65b559 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -730,6 +730,27 @@ static inline void mnt_unhold_writers(struct mount *mnt)
 	mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
 }
 
+static inline void mnt_del_instance(struct mount *m)
+{
+	struct mount **p = m->mnt_pprev_for_sb;
+	struct mount *next = m->mnt_next_for_sb;
+
+	if (next)
+		next->mnt_pprev_for_sb = p;
+	*p = next;
+}
+
+static inline void mnt_add_instance(struct mount *m, struct super_block *s)
+{
+	struct mount *first = s->s_mounts;
+
+	if (first)
+		first->mnt_pprev_for_sb = &m->mnt_next_for_sb;
+	m->mnt_next_for_sb = first;
+	m->mnt_pprev_for_sb = &s->s_mounts;
+	s->s_mounts = m;
+}
+
 static int mnt_make_readonly(struct mount *mnt)
 {
 	int ret;
@@ -743,7 +764,6 @@ static int mnt_make_readonly(struct mount *mnt)
 
 int sb_prepare_remount_readonly(struct super_block *sb)
 {
-	struct mount *mnt;
 	int err = 0;
 
 	/* Racy optimization.  Recheck the counter under MNT_WRITE_HOLD */
@@ -751,9 +771,9 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 		return -EBUSY;
 
 	lock_mount_hash();
-	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
-		if (!(mnt->mnt.mnt_flags & MNT_READONLY)) {
-			err = mnt_hold_writers(mnt);
+	for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {
+		if (!(m->mnt.mnt_flags & MNT_READONLY)) {
+			err = mnt_hold_writers(m);
 			if (err)
 				break;
 		}
@@ -763,9 +783,9 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 
 	if (!err)
 		sb_start_ro_state_change(sb);
-	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
-		if (mnt->mnt.mnt_flags & MNT_WRITE_HOLD)
-			mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
+	for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {
+		if (m->mnt.mnt_flags & MNT_WRITE_HOLD)
+			m->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
 	}
 	unlock_mount_hash();
 
@@ -1207,7 +1227,7 @@ static void setup_mnt(struct mount *m, struct dentry *root)
 	m->mnt_parent = m;
 
 	lock_mount_hash();
-	list_add_tail(&m->mnt_instance, &s->s_mounts);
+	mnt_add_instance(m, s);
 	unlock_mount_hash();
 }
 
@@ -1425,7 +1445,7 @@ static void mntput_no_expire(struct mount *mnt)
 	mnt->mnt.mnt_flags |= MNT_DOOMED;
 	rcu_read_unlock();
 
-	list_del(&mnt->mnt_instance);
+	mnt_del_instance(mnt);
 	if (unlikely(!list_empty(&mnt->mnt_expire)))
 		list_del(&mnt->mnt_expire);
 
diff --git a/fs/super.c b/fs/super.c
index 7f876f32343a..3b0f49e1b817 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -323,7 +323,6 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	if (!s)
 		return NULL;
 
-	INIT_LIST_HEAD(&s->s_mounts);
 	s->s_user_ns = get_user_ns(user_ns);
 	init_rwsem(&s->s_umount);
 	lockdep_set_class(&s->s_umount, &type->s_umount_key);
@@ -408,7 +407,7 @@ static void __put_super(struct super_block *s)
 		list_del_init(&s->s_list);
 		WARN_ON(s->s_dentry_lru.node);
 		WARN_ON(s->s_inode_lru.node);
-		WARN_ON(!list_empty(&s->s_mounts));
+		WARN_ON(s->s_mounts);
 		call_rcu(&s->rcu, destroy_super_rcu);
 	}
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d7ab4f96d705..0e9c7f1460dc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1324,6 +1324,8 @@ struct sb_writers {
 	struct percpu_rw_semaphore	rw_sem[SB_FREEZE_LEVELS];
 };
 
+struct mount;
+
 struct super_block {
 	struct list_head	s_list;		/* Keep this first */
 	dev_t			s_dev;		/* search index; _not_ kdev_t */
@@ -1358,7 +1360,7 @@ struct super_block {
 	__u16 s_encoding_flags;
 #endif
 	struct hlist_bl_head	s_roots;	/* alternate root dentries for NFS */
-	struct list_head	s_mounts;	/* list of mounts; _not_ for fs use */
+	struct mount		*s_mounts;	/* list of mounts; _not_ for fs use */
 	struct block_device	*s_bdev;	/* can go away once we use an accessor for @s_bdev_file */
 	struct file		*s_bdev_file;
 	struct backing_dev_info *s_bdi;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [62/63] struct mount: relocate MNT_WRITE_HOLD bit
  2025-08-29  6:03             ` Al Viro
                                 ` (2 preceding siblings ...)
  2025-08-29  6:06               ` [61/63] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags Al Viro
@ 2025-08-29  6:07               ` Al Viro
  2025-09-01 11:26                 ` Christian Brauner
  3 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-29  6:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, brauner, jack

... from ->mnt_flags to LSB of ->mnt_pprev_for_sb.

This is safe - we always set and clear it within the same mount_lock
scope, so we won't interfere with list operations - traversals are
always forward, so they don't even look at ->mnt_prev_for_sb and
both insertions and removals are in mount_lock scopes of their own,
so that bit will be clear in *all* mount instances during those.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/mount.h            | 25 ++++++++++++++++++++++++-
 fs/namespace.c        | 34 +++++++++++++++++-----------------
 include/linux/mount.h |  3 +--
 3 files changed, 42 insertions(+), 20 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index b208f69f69d7..40cf16544317 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -66,7 +66,8 @@ struct mount {
 	struct list_head mnt_child;	/* and going through their mnt_child */
 	struct mount *mnt_next_for_sb;	/* the next two fields are hlist_node, */
 	struct mount * __aligned(1) *mnt_pprev_for_sb;
-					/* except that LSB of pprev will be stolen */
+					/* except that LSB of pprev is stolen */
+#define WRITE_HOLD 1			/* ... for use by mnt_hold_writers() */
 	const char *mnt_devname;	/* Name of device e.g. /dev/dsk/hda1 */
 	struct list_head mnt_list;
 	struct list_head mnt_expire;	/* link in fs-specific expiry list */
@@ -244,4 +245,26 @@ static inline struct mount *topmost_overmount(struct mount *m)
 	return m;
 }
 
+static inline bool __test_write_hold(struct mount * __aligned(1) *val)
+{
+	return (unsigned long)val & WRITE_HOLD;
+}
+
+static inline bool test_write_hold(const struct mount *m)
+{
+	return __test_write_hold(m->mnt_pprev_for_sb);
+}
+
+static inline void set_write_hold(struct mount *m)
+{
+	m->mnt_pprev_for_sb = (void *)((unsigned long)m->mnt_pprev_for_sb
+				       | WRITE_HOLD);
+}
+
+static inline void clear_write_hold(struct mount *m)
+{
+	m->mnt_pprev_for_sb = (void *)((unsigned long)m->mnt_pprev_for_sb
+				       & ~WRITE_HOLD);
+}
+
 struct mnt_namespace *mnt_ns_from_dentry(struct dentry *dentry);
diff --git a/fs/namespace.c b/fs/namespace.c
index 06be5b65b559..8e6b6523d3e8 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -509,20 +509,20 @@ int mnt_get_write_access(struct vfsmount *m)
 	mnt_inc_writers(mnt);
 	/*
 	 * The store to mnt_inc_writers must be visible before we pass
-	 * MNT_WRITE_HOLD loop below, so that the slowpath can see our
-	 * incremented count after it has set MNT_WRITE_HOLD.
+	 * WRITE_HOLD loop below, so that the slowpath can see our
+	 * incremented count after it has set WRITE_HOLD.
 	 */
 	smp_mb();
 	might_lock(&mount_lock.lock);
-	while (READ_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD) {
+	while (__test_write_hold(READ_ONCE(mnt->mnt_pprev_for_sb))) {
 		if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
 			cpu_relax();
 		} else {
 			/*
 			 * This prevents priority inversion, if the task
-			 * setting MNT_WRITE_HOLD got preempted on a remote
+			 * setting WRITE_HOLD got preempted on a remote
 			 * CPU, and it prevents life lock if the task setting
-			 * MNT_WRITE_HOLD has a lower priority and is bound to
+			 * WRITE_HOLD has a lower priority and is bound to
 			 * the same CPU as the task that is spinning here.
 			 */
 			preempt_enable();
@@ -533,7 +533,7 @@ int mnt_get_write_access(struct vfsmount *m)
 	}
 	/*
 	 * The barrier pairs with the barrier sb_start_ro_state_change() making
-	 * sure that if we see MNT_WRITE_HOLD cleared, we will also see
+	 * sure that if we see WRITE_HOLD cleared, we will also see
 	 * s_readonly_remount set (or even SB_RDONLY / MNT_READONLY flags) in
 	 * mnt_is_readonly() and bail in case we are racing with remount
 	 * read-only.
@@ -672,15 +672,15 @@ EXPORT_SYMBOL(mnt_drop_write_file);
  * @mnt.
  *
  * Context: This function expects lock_mount_hash() to be held serializing
- *          setting MNT_WRITE_HOLD.
+ *          setting WRITE_HOLD.
  * Return: On success 0 is returned.
  *	   On error, -EBUSY is returned.
  */
 static inline int mnt_hold_writers(struct mount *mnt)
 {
-	mnt->mnt.mnt_flags |= MNT_WRITE_HOLD;
+	set_write_hold(mnt);
 	/*
-	 * After storing MNT_WRITE_HOLD, we'll read the counters. This store
+	 * After storing WRITE_HOLD, we'll read the counters. This store
 	 * should be visible before we do.
 	 */
 	smp_mb();
@@ -696,9 +696,9 @@ static inline int mnt_hold_writers(struct mount *mnt)
 	 * sum up each counter, if we read a counter before it is incremented,
 	 * but then read another CPU's count which it has been subsequently
 	 * decremented from -- we would see more decrements than we should.
-	 * MNT_WRITE_HOLD protects against this scenario, because
+	 * WRITE_HOLD protects against this scenario, because
 	 * mnt_want_write first increments count, then smp_mb, then spins on
-	 * MNT_WRITE_HOLD, so it can't be decremented by another CPU while
+	 * WRITE_HOLD, so it can't be decremented by another CPU while
 	 * we're counting up here.
 	 */
 	if (mnt_get_writers(mnt) > 0)
@@ -720,14 +720,14 @@ static inline int mnt_hold_writers(struct mount *mnt)
  */
 static inline void mnt_unhold_writers(struct mount *mnt)
 {
-	if (!(mnt->mnt_flags & MNT_WRITE_HOLD))
+	if (!test_write_hold(mnt))
 		return;
 	/*
-	 * MNT_READONLY must become visible before ~MNT_WRITE_HOLD, so writers
+	 * MNT_READONLY must become visible before ~WRITE_HOLD, so writers
 	 * that become unheld will see MNT_READONLY.
 	 */
 	smp_wmb();
-	mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
+	clear_write_hold(mnt);
 }
 
 static inline void mnt_del_instance(struct mount *m)
@@ -766,7 +766,7 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 {
 	int err = 0;
 
-	/* Racy optimization.  Recheck the counter under MNT_WRITE_HOLD */
+	/* Racy optimization.  Recheck the counter under WRITE_HOLD */
 	if (atomic_long_read(&sb->s_remove_count))
 		return -EBUSY;
 
@@ -784,8 +784,8 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 	if (!err)
 		sb_start_ro_state_change(sb);
 	for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {
-		if (m->mnt.mnt_flags & MNT_WRITE_HOLD)
-			m->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
+		if (test_write_hold(m))
+			clear_write_hold(m);
 	}
 	unlock_mount_hash();
 
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 18e4b97f8a98..85e97b9340ff 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -33,7 +33,6 @@ enum mount_flags {
 	MNT_NOSYMFOLLOW	= 0x80,
 
 	MNT_SHRINKABLE	= 0x100,
-	MNT_WRITE_HOLD	= 0x200,
 
 	MNT_INTERNAL	= 0x4000,
 
@@ -52,7 +51,7 @@ enum mount_flags {
 				  | MNT_READONLY | MNT_NOSYMFOLLOW,
 	MNT_ATIME_MASK = MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME,
 
-	MNT_INTERNAL_FLAGS = MNT_WRITE_HOLD | MNT_INTERNAL | MNT_DOOMED |
+	MNT_INTERNAL_FLAGS = MNT_INTERNAL | MNT_DOOMED |
 			     MNT_SYNC_UMOUNT | MNT_LOCKED
 };
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 04/63] __detach_mounts(): use guards
  2025-08-28 23:07     ` [PATCH v2 04/63] __detach_mounts(): use guards Al Viro
@ 2025-08-29  9:48       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-29  9:48 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:07:07AM +0100, Al Viro wrote:
> Clean fit for guards use; guards can't be weaker due to umount_tree() calls.
> ---

Did you drop my earlier RvB on accident? In any case:

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 02/63] introduced guards for mount_lock
  2025-08-28 23:07     ` [PATCH v2 02/63] introduced guards for mount_lock Al Viro
@ 2025-08-29  9:49       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-29  9:49 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:07:05AM +0100, Al Viro wrote:
> mount_writer: write_seqlock; that's an equivalent of {un,}lock_mount_hash()
> mount_locked_reader: read_seqlock_excl; these tend to be open-coded.
> 
> No bulk conversions, please - if nothing else, quite a few places take
> use mount_writer form when mount_locked_reader is sufficent.  It needs
> to be dealt with carefully.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 13/63] has_locked_children(): use guards
  2025-08-28 23:07     ` [PATCH v2 13/63] has_locked_children(): use guards Al Viro
@ 2025-08-29  9:49       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-29  9:49 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:07:16AM +0100, Al Viro wrote:
> ... and document the locking requirements of __has_locked_children()
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 14/63] mnt_set_expiry(): use guards
  2025-08-28 23:07     ` [PATCH v2 14/63] mnt_set_expiry(): " Al Viro
@ 2025-08-29  9:49       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-29  9:49 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:07:17AM +0100, Al Viro wrote:
> The reason why it needs only mount_locked_reader is that there's no lockless
> accesses of expiry lists.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 18/63] switch do_new_mount_fc() to fc_mount()
  2025-08-28 23:07     ` [PATCH v2 18/63] switch do_new_mount_fc() to fc_mount() Al Viro
@ 2025-08-29  9:53       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-29  9:53 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:07:21AM +0100, Al Viro wrote:
> Prior to the call of do_new_mount_fc() the caller has just done successful
> vfs_get_tree().  Then do_new_mount_fc() does several checks on resulting
> superblock, and either does fc_drop_locked() and returns an error or
> proceeds to unlock the superblock and call vfs_create_mount().
> 
> The thing is, there's no reason to delay that unlock + vfs_create_mount() -
> the tests do not rely upon the state of ->s_umount and
> 	fc_drop_locked()
> 	put_fs_context()
> is equivalent to
> 	unlock ->s_umount
> 	put_fs_context()
> 
> Doing vfs_create_mount() before the checks allows us to move vfs_get_tree()
> from caller to do_new_mount_fc() and collapse it with vfs_create_mount()
> into an fc_mount() call.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

>  fs/namespace.c | 29 ++++++++++++-----------------
>  1 file changed, 12 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 0474b3a93dbf..9b575c9eee0b 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3705,25 +3705,20 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
>  static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
>  			   unsigned int mnt_flags)
>  {
> -	struct vfsmount *mnt;
>  	struct pinned_mountpoint mp = {};
> -	struct super_block *sb = fc->root->d_sb;
> +	struct super_block *sb;
> +	struct vfsmount *mnt = fc_mount(fc);
>  	int error;
>  
> +	if (IS_ERR(mnt))
> +		return PTR_ERR(mnt);

Fwiw, I find this pattern where the variable is assigned by function
call at declaration time in the middle of other variables and then
immediately further below check for the error to be rather ugly. I'd
much rather just do:

  +	struct vfsmount *mnt;
   	int error;
   
	mnt = fc_mount(fc)
  +	if (IS_ERR(mnt))
  +		return PTR_ERR(mnt);

But anyway, I acknowledge the difference in taste here is really not
that important.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 55/63] open_detached_copy(): separate creation of namespace into helper
  2025-08-28 23:07     ` [PATCH v2 55/63] open_detached_copy(): separate creation of namespace into helper Al Viro
@ 2025-08-29  9:54       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-29  9:54 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:07:58AM +0100, Al Viro wrote:
> ... and convert the helper to use of a guard(namespace_excl)
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 57/63] copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure
  2025-08-28 23:08     ` [PATCH v2 57/63] copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure Al Viro
@ 2025-08-29  9:56       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-29  9:56 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:08:00AM +0100, Al Viro wrote:
> Now that free_mnt_ns() works prior to mnt_ns_tree_add(), there's no need for
> an open-coded analogue free_mnt_ns() there - yes, we do avoid one call_rcu()
> use per failing call of clone() or unshare(), if they fail due to OOM in that
> particular spot, but it's not really worth bothering.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 56/63] mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list
  2025-08-28 23:07     ` [PATCH v2 56/63] mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list Al Viro
@ 2025-08-29  9:57       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-08-29  9:57 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:07:59AM +0100, Al Viro wrote:
> Actual removal is done under the lock, but for checking if need to bother
> the lockless list_empty() is safe - either that namespace never had never

nit: two "never"s

> been added to mnt_ns_tree, in which case the list will stay empty, or
> whoever had allocated it has called mnt_ns_tree_add() and it has already
> run to completion.  After that point list_empty() will become false and
> will remain false, no matter what we do with the neighbors in mnt_ns_list.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

>  fs/namespace.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index c324800e770c..daa72292ea58 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -195,7 +195,7 @@ static void mnt_ns_release_rcu(struct rcu_head *rcu)
>  static void mnt_ns_tree_remove(struct mnt_namespace *ns)
>  {
>  	/* remove from global mount namespace list */
> -	if (!is_anon_ns(ns)) {
> +	if (!list_empty(&ns->mnt_ns_list)) {
>  		mnt_ns_tree_write_lock();
>  		rb_erase(&ns->mnt_ns_tree_node, &mnt_ns_tree);
>  		list_bidir_del_rcu(&ns->mnt_ns_list);
> -- 
> 2.47.2
> 

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [60/63] setup_mnt(): primitive for connecting a mount to filesystem
  2025-08-29  6:05               ` [60/63] setup_mnt(): primitive for connecting a mount to filesystem Al Viro
@ 2025-08-29  9:59                 ` Christian Brauner
  2025-08-29 16:37                   ` Al Viro
  2025-09-01 11:17                 ` [60/63] setup_mnt(): primitive for connecting a mount to filesystem Christian Brauner
  1 sibling, 1 reply; 320+ messages in thread
From: Christian Brauner @ 2025-08-29  9:59 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel, jack

On Fri, Aug 29, 2025 at 07:05:22AM +0100, Al Viro wrote:
> Take the identical logics in vfs_create_mount() and clone_mnt() into
> a new helper that takes an empty struct mount and attaches it to
> given dentry (sub)tree.
> 
> Should be called once in the lifetime of every mount, prior to making
> it visible in any data structures.
> 
> After that point ->mnt_root and ->mnt_sb never change; ->mnt_root
> is a counting reference to dentry and ->mnt_sb - an active reference
> to superblock.
> 
> Mount remains associated with that dentry tree all the way until
> the call of cleanup_mnt(), when the refcount eventually drops
> to zero.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Is this supposed to be the v3? I'm confused what I need to be looking
at since it's a reply to v2 and some earlier review comments...

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-28  1:29                   ` Linus Torvalds
@ 2025-08-29 12:30                     ` Theodore Ts'o
  2025-08-29 18:25                       ` Konstantin Ryabitsev
  0 siblings, 1 reply; 320+ messages in thread
From: Theodore Ts'o @ 2025-08-29 12:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Konstantin Ryabitsev, Christian Brauner, Al Viro, linux-fsdevel,
	Jan Kara

On Wed, Aug 27, 2025 at 06:29:50PM -0700, Linus Torvalds wrote:
> On Wed, 27 Aug 2025 at 17:41, Konstantin Ryabitsev
> <konstantin@linuxfoundation.org> wrote:
> >
> > I'm not sure what you mean. The Link: trailer is added when the maintainer
> > pulls in the series into their tree.
> 
> That's my point. Adding it to the commit at that point is entirely
> useless, because
> 
>  (a) that email doesn't have the *reason* for the patch (or rather, if
> it does, then the link to the email is pointless, since the *real*
> reason was mentioned already)

From a maintainer's perspective, the reason why I keep the link in is
because I'm dumb-ass lazy.  My workflow involves looking at patchwork,
cutting-and-pasting the Message-Id, and then passing it to b4.
Looking through a 20 patch series to figure out which one rates a
Link: trailer, and which one doesn't is a pain in the *ss, and in the
off-chance that there *is* a meaningful and deep discussion, it would
be nice to be able to capture it.  But it might be in patch #4; or
patch #12; or patches #14 and #15.  Also, there might be an extended
conversation thread in the patch series description (patch #0) and it
would be useful to be able to get a link to it.

So here's a set of feature requests for b4.

(a) It would be cool(tm), if there was a way for b4 to automatically
    detect whether or not there was a reply to a patch at the time that
    "b4 am" is run; if there is, to include the patch series.  If there
    isn't an e-mail reply, skip the the Link: trailer.

(b) In the case of a patch series, it would be useful to include some
    kind of trailer indicating that a group of patches are logically
    grouped together (maybe a patch-series: that has the message id to
    the the series header, or the first patch if there is no patch #0)
    --- because one of the other ways that I figure out that a series
    of commits are part of a patch series is by looking at the Link:
    field since if the messages are generated using "git send-email"
    it's usually obvious from the message id.  This has also come up
    from some of the folks who want this for their web-based review
    systems.  I don't care about that, but if it solves multiple use
    cases at once, that's great.

(c) Include a link tag to the patch series description e-mail message
    (if present) in the first commit of the patch series so it's
    possible to read the patch #0 description of the patch series,
    since otherwise this can get be hard to find in the git history.

(d) For bonus points, if there is a way to determine a link to the
    previous versions of the patch series, it would be useful for to
    incude link: tags to previous versions of the patch if and only if
    there were e-mail comments to say, the v5, v12, and v27 versions
    of the patch.

(e) If there is some way we can easily capture lore.kernel.org URL for
    the vN-1 version of the patch series in the vN commit description
    header, in "b4 prep" that would be *excellent*.  I don't think it
    can do this today, but if it can, can we make sure it's defaulted
    to on, and then we should **really** market the heck out of b4
    prep?

The bottom line is I'd love to make Linus less cranky; but I'd also love
it if I didn't have to do the extra work by hand.  :-)   Because if I do
have to do it by hand, I will probably screw up, and my preference has
been to err on the side of having the link, so it's there when I'm
having to code code archeology --- even if most of the time it's not
strictly speaking necessary.

Cheers,

      	       	       	      	    	 - Ted

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [60/63] setup_mnt(): primitive for connecting a mount to filesystem
  2025-08-29  9:59                 ` Christian Brauner
@ 2025-08-29 16:37                   ` Al Viro
  2025-08-30  4:36                     ` Al Viro
  0 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-29 16:37 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Linus Torvalds, linux-fsdevel, jack

On Fri, Aug 29, 2025 at 11:59:55AM +0200, Christian Brauner wrote:
> On Fri, Aug 29, 2025 at 07:05:22AM +0100, Al Viro wrote:
> > Take the identical logics in vfs_create_mount() and clone_mnt() into
> > a new helper that takes an empty struct mount and attaches it to
> > given dentry (sub)tree.
> > 
> > Should be called once in the lifetime of every mount, prior to making
> > it visible in any data structures.
> > 
> > After that point ->mnt_root and ->mnt_sb never change; ->mnt_root
> > is a counting reference to dentry and ->mnt_sb - an active reference
> > to superblock.
> > 
> > Mount remains associated with that dentry tree all the way until
> > the call of cleanup_mnt(), when the refcount eventually drops
> > to zero.
> > 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > ---
> 
> Is this supposed to be the v3? I'm confused what I need to be looking
> at since it's a reply to v2 and some earlier review comments...

It would be in v3, but I didn't feel like sending another 63-patch
mailbomb for the sake of these 4 changed commits (well, and a cosmetical
change in #33, with matching modification in #35, ending with both
being cleaner - with the same resulting tree after #35).

These 4 do repace #59..#62 in v3.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHED][RFC][CFT] mount-related stuff
  2025-08-29 12:30                     ` Theodore Ts'o
@ 2025-08-29 18:25                       ` Konstantin Ryabitsev
  0 siblings, 0 replies; 320+ messages in thread
From: Konstantin Ryabitsev @ 2025-08-29 18:25 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Linus Torvalds, Christian Brauner, Al Viro, linux-fsdevel,
	Jan Kara

On Fri, Aug 29, 2025 at 08:30:33AM -0400, Theodore Ts'o wrote:
> So here's a set of feature requests for b4.
> 
> (a) It would be cool(tm), if there was a way for b4 to automatically
>     detect whether or not there was a reply to a patch at the time that
>     "b4 am" is run; if there is, to include the patch series.  If there
>     isn't an e-mail reply, skip the the Link: trailer.

I'm afraid this would mostly breed confusion.

> (b) In the case of a patch series, it would be useful to include some
>     kind of trailer indicating that a group of patches are logically
>     grouped together (maybe a patch-series: that has the message id to
>     the the series header, or the first patch if there is no patch #0)
>     --- because one of the other ways that I figure out that a series
>     of commits are part of a patch series is by looking at the Link:
>     field since if the messages are generated using "git send-email"
>     it's usually obvious from the message id.  This has also come up
>     from some of the folks who want this for their web-based review
>     systems.  I don't care about that, but if it solves multiple use
>     cases at once, that's great.

This is already in place with the change-id trailer (and the corresponding
X-Change-ID email header).

However, only b4 puts those in. Series prepared and sent with git-send-email
don't have any identifier like that.

> (c) Include a link tag to the patch series description e-mail message
>     (if present) in the first commit of the patch series so it's
>     possible to read the patch #0 description of the patch series,
>     since otherwise this can get be hard to find in the git history.

We're talking about the lore.kernel.org web interface?

> (d) For bonus points, if there is a way to determine a link to the
>     previous versions of the patch series, it would be useful for to
>     incude link: tags to previous versions of the patch if and only if
>     there were e-mail comments to say, the v5, v12, and v27 versions
>     of the patch.

Again, are we talking in the context of the lore.kernel.org web interface? The
initial discussion about Link: tags was about them being present in git
commits.

> (e) If there is some way we can easily capture lore.kernel.org URL for
>     the vN-1 version of the patch series in the vN commit description
>     header, in "b4 prep" that would be *excellent*.  I don't think it
>     can do this today, but if it can, can we make sure it's defaulted
>     to on, and then we should **really** market the heck out of b4
>     prep?

You can do this for any b4-prep sent series by just searching for the
change-id string. E.g.:

https://lore.kernel.org/lkml/?q=20241018-pmu_event_info-986e21ce6bd3

`b4 prep` is used quite extensively these days, but it's far from being
predominant.

> The bottom line is I'd love to make Linus less cranky; but I'd also love
> it if I didn't have to do the extra work by hand.  :-)   Because if I do
> have to do it by hand, I will probably screw up, and my preference has
> been to err on the side of having the link, so it's there when I'm
> having to code code archeology --- even if most of the time it's not
> strictly speaking necessary.

This doesn't ultimately solve the problem that we're butting heads about --
that it's impossible to reliably match a commit to its provenance. Using Link:
trailers indicating where the patch came from is the only reliable mechanism
we have thus far, because it establishes this relationship unequivocally.
However, these links annoy Linus, who would like this to be automated in some
other way behind the scenes. I'd love to be able to do so, but short of
running some kind of "provenance transparency log" of curated commit ->
message-id mappings, I don't see how it's possible.

-K

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [60/63] setup_mnt(): primitive for connecting a mount to filesystem
  2025-08-29 16:37                   ` Al Viro
@ 2025-08-30  4:36                     ` Al Viro
  2025-08-30  7:33                       ` [RFC] does # really need to be escaped in devnames? Al Viro
  0 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-30  4:36 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Linus Torvalds, linux-fsdevel, jack

On Fri, Aug 29, 2025 at 05:37:17PM +0100, Al Viro wrote:

> It would be in v3, but I didn't feel like sending another 63-patch
> mailbomb for the sake of these 4 changed commits (well, and a cosmetical
> change in #33, with matching modification in #35, ending with both
> being cleaner - with the same resulting tree after #35).
> 
> These 4 do repace #59..#62 in v3.

Speaking of v3 - does anybody have objections to the following?
	1) allow ->show_path() to return -EOPNOTSUPP, interpreted as
"fall back to default seq_path(...)"?  E.g. kernfs_sop_show_path()
could return that if there's no ->scops->show_path().
	2) pass the sodding escape set as explicit argument, made
an argument of fs/namespace.c:show_path() as well.
	3) similar for ->show_devname().
	4) ... and to hell with those string_unescape_inplace() calls
in there.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* [RFC] does # really need to be escaped in devnames?
  2025-08-30  4:36                     ` Al Viro
@ 2025-08-30  7:33                       ` Al Viro
  2025-08-30 19:40                         ` Linus Torvalds
  0 siblings, 1 reply; 320+ messages in thread
From: Al Viro @ 2025-08-30  7:33 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, linux-fsdevel, jack, Siddhesh Poyarekar, Ian Kent,
	David Howells, Christian Brauner

	On one hand, we have commit ed5fce76b5ea "vfs: escape hash as
well" which added # to the escape set for devname in /prov/*/mount*;
on another there's nfs_show_devname() doing
                seq_escape(m, devname, " \t\n\\");
and similar for btrfs.  And then there is afs_show_devname() that outright
includes # in that thing on regular basis:
	char pref = '%';
	...
        switch (volume->type) {
	case AFSVL_RWVOL:
		break;
	case AFSVL_ROVOL:
		pref = '#';
		if (volume->type_force)
			suf = ".readonly";
		break;
	case AFSVL_BACKVOL:
		pref = '#';
		suf = ".backup";
		break;
	}

	seq_printf(m, "%c%s:%s%s", pref, cell->name, volume->name, suf);

For NFS and btrfs ones I might be convinced to add # to escape set; for
AFS, though, I strongly suspect that userland would be very unhappy,
and that's userland predating whatever code that "aims to parse fstab as
well as /proc/mounts with the same logic" ed5fce76b5ea is refering to.

So...  Siddhesh, could you clarify the claim about breaking getmntent(3)?
Does it or does it not happen on every system that has readonly AFS
volumes mounted?

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [RFC] does # really need to be escaped in devnames?
  2025-08-30  7:33                       ` [RFC] does # really need to be escaped in devnames? Al Viro
@ 2025-08-30 19:40                         ` Linus Torvalds
  2025-08-30 20:42                           ` Al Viro
  2025-09-02 15:03                           ` Siddhesh Poyarekar
  0 siblings, 2 replies; 320+ messages in thread
From: Linus Torvalds @ 2025-08-30 19:40 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, jack, Siddhesh Poyarekar, Ian Kent, David Howells,
	Christian Brauner

On Sat, 30 Aug 2025 at 00:33, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> So...  Siddhesh, could you clarify the claim about breaking getmntent(3)?
> Does it or does it not happen on every system that has readonly AFS
> volumes mounted?

Hmm. Looking at various source trees using Debian code search, at
least dietlibc doesn't treat '#' specially at all.

And glibc seems to treat only a line that *starts* with a '#'
(possibly preceded by space/tab combinations) as an empty line.

klibc checks for '#' at the beginning of the file (without any
potential space skipping before)

Busybox seems to do the same "skip whitespace, then skip lines
starting with '#'" that glibc does.

So I think the '#'-escaping logic is wrong.  We should only escape '#'
marks at the beginning of a line (since we already escape spaces and
tabs, the "preceded by whitespace" doesn't matter).

And that means that we shouldn't do it in 'mangle()' at all - because
it's irrelevant for any field but the first.

And the first field in /proc/mounts is that 'r->mnt_devname' (or
show_devname), and again, that should only trigger on the first
character, not every character.

Now, could there be other libraries that get this even worse wrong? Of
course. But

             Linus

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [RFC] does # really need to be escaped in devnames?
  2025-08-30 19:40                         ` Linus Torvalds
@ 2025-08-30 20:42                           ` Al Viro
  2025-09-02 15:03                           ` Siddhesh Poyarekar
  1 sibling, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-08-30 20:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-fsdevel, jack, Siddhesh Poyarekar, Ian Kent, David Howells,
	Christian Brauner

On Sat, Aug 30, 2025 at 12:40:32PM -0700, Linus Torvalds wrote:
> On Sat, 30 Aug 2025 at 00:33, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > So...  Siddhesh, could you clarify the claim about breaking getmntent(3)?
> > Does it or does it not happen on every system that has readonly AFS
> > volumes mounted?
> 
> Hmm. Looking at various source trees using Debian code search, at
> least dietlibc doesn't treat '#' specially at all.
> 
> And glibc seems to treat only a line that *starts* with a '#'
> (possibly preceded by space/tab combinations) as an empty line.
> 
> klibc checks for '#' at the beginning of the file (without any
> potential space skipping before)
> 
> Busybox seems to do the same "skip whitespace, then skip lines
> starting with '#'" that glibc does.
> 
> So I think the '#'-escaping logic is wrong.  We should only escape '#'
> marks at the beginning of a line (since we already escape spaces and
> tabs, the "preceded by whitespace" doesn't matter).
> 
> And that means that we shouldn't do it in 'mangle()' at all - because
> it's irrelevant for any field but the first.
> 
> And the first field in /proc/mounts is that 'r->mnt_devname' (or
> show_devname), and again, that should only trigger on the first
> character, not every character.

*nod*

Amusingly enough, glibc addmntent(3) does *not* consider # for an
octal escape.

BTW, another amuzing bogosity:

	seq_escape(m, "blah", "X") => "blah"
	seq_escape(m, "blah", "b") => "\142lah"
	seq_escape(m, "blah", "") => "\142\154\141\150"

IOW, about 10 years ago an empty string switched meaning from "escape nothing"
to "escape everything"...

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [60/63] setup_mnt(): primitive for connecting a mount to filesystem
  2025-08-29  6:05               ` [60/63] setup_mnt(): primitive for connecting a mount to filesystem Al Viro
  2025-08-29  9:59                 ` Christian Brauner
@ 2025-09-01 11:17                 ` Christian Brauner
  1 sibling, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-09-01 11:17 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel, jack

On Fri, Aug 29, 2025 at 07:05:22AM +0100, Al Viro wrote:
> Take the identical logics in vfs_create_mount() and clone_mnt() into
> a new helper that takes an empty struct mount and attaches it to
> given dentry (sub)tree.
> 
> Should be called once in the lifetime of every mount, prior to making
> it visible in any data structures.
> 
> After that point ->mnt_root and ->mnt_sb never change; ->mnt_root
> is a counting reference to dentry and ->mnt_sb - an active reference
> to superblock.
> 
> Mount remains associated with that dentry tree all the way until
> the call of cleanup_mnt(), when the refcount eventually drops
> to zero.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [59/63] simplify the callers of mnt_unhold_writers()
  2025-08-29  6:04               ` [59/63] simplify the callers of mnt_unhold_writers() Al Viro
@ 2025-09-01 11:20                 ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-09-01 11:20 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel, jack

On Fri, Aug 29, 2025 at 07:04:36AM +0100, Al Viro wrote:
> The logics in cleanup on failure in mount_setattr_prepare() is simplified
> by having the mnt_hold_writers() failure followed by advancing m to the
> next node in the tree before leaving the loop.
> 
> And since all calls are preceded by the same check that flag has been set
> and the function is inlined, let's just shift the check into it.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [62/63] struct mount: relocate MNT_WRITE_HOLD bit
  2025-08-29  6:07               ` [62/63] struct mount: relocate MNT_WRITE_HOLD bit Al Viro
@ 2025-09-01 11:26                 ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-09-01 11:26 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel, jack

On Fri, Aug 29, 2025 at 07:07:05AM +0100, Al Viro wrote:
> ... from ->mnt_flags to LSB of ->mnt_pprev_for_sb.
> 
> This is safe - we always set and clear it within the same mount_lock
> scope, so we won't interfere with list operations - traversals are
> always forward, so they don't even look at ->mnt_prev_for_sb and
> both insertions and removals are in mount_lock scopes of their own,
> so that bit will be clear in *all* mount instances during those.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/mount.h            | 25 ++++++++++++++++++++++++-
>  fs/namespace.c        | 34 +++++++++++++++++-----------------
>  include/linux/mount.h |  3 +--
>  3 files changed, 42 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/mount.h b/fs/mount.h
> index b208f69f69d7..40cf16544317 100644
> --- a/fs/mount.h
> +++ b/fs/mount.h
> @@ -66,7 +66,8 @@ struct mount {
>  	struct list_head mnt_child;	/* and going through their mnt_child */
>  	struct mount *mnt_next_for_sb;	/* the next two fields are hlist_node, */
>  	struct mount * __aligned(1) *mnt_pprev_for_sb;
> -					/* except that LSB of pprev will be stolen */
> +					/* except that LSB of pprev is stolen */
> +#define WRITE_HOLD 1			/* ... for use by mnt_hold_writers() */
>  	const char *mnt_devname;	/* Name of device e.g. /dev/dsk/hda1 */
>  	struct list_head mnt_list;
>  	struct list_head mnt_expire;	/* link in fs-specific expiry list */
> @@ -244,4 +245,26 @@ static inline struct mount *topmost_overmount(struct mount *m)
>  	return m;
>  }
>  
> +static inline bool __test_write_hold(struct mount * __aligned(1) *val)
> +{
> +	return (unsigned long)val & WRITE_HOLD;
> +}
> +
> +static inline bool test_write_hold(const struct mount *m)
> +{
> +	return __test_write_hold(m->mnt_pprev_for_sb);
> +}
> +
> +static inline void set_write_hold(struct mount *m)
> +{
> +	m->mnt_pprev_for_sb = (void *)((unsigned long)m->mnt_pprev_for_sb
> +				       | WRITE_HOLD);
> +}
> +
> +static inline void clear_write_hold(struct mount *m)
> +{
> +	m->mnt_pprev_for_sb = (void *)((unsigned long)m->mnt_pprev_for_sb
> +				       & ~WRITE_HOLD);
> +}

I have to say that I find this really unpleasant but...
I've seen issues withe current MNT_WRITE_HOLD handling before when it
interacted with MNT_ONRB (I killed that a while ago),
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [61/63] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags
  2025-08-29  6:06               ` [61/63] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags Al Viro
@ 2025-09-01 11:27                 ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-09-01 11:27 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel, jack

On Fri, Aug 29, 2025 at 07:06:15AM +0100, Al Viro wrote:
> We have an unpleasant wart in accessibility rules for struct mount.  There
> are per-superblock lists of mounts, used by sb_prepare_remount_readonly()
> to check if any of those is currently claimed for write access and to
> block further attempts to get write access on those until we are done.
> 
> As soon as it is attached to a filesystem, mount becomes reachable
> via that list.  Only sb_prepare_remount_readonly() traverses it and
> it only accesses a few members of struct mount.  Unfortunately,
> ->mnt_flags is one of those and it is modified - MNT_WRITE_HOLD set
> and then cleared.  It is done under mount_lock, so from the locking
> rules POV everything's fine.
> 
> However, it has easily overlooked implications - once mount has been
> attached to a filesystem, it has to be treated as globally visible.
> In particular, initializing ->mnt_flags *must* be done either prior
> to that point or under mount_lock.  All other members are still
> private at that point.
> 
> Life gets simpler if we move that bit (and that's *all* that can get
> touched by access via this list) out of ->mnt_flags.  It's not even
> hard to do - currently the list is implemented as list_head one,
> anchored in super_block->s_mounts and linked via mount->mnt_instance.
> 
> As the first step, switch it to hlist-like open-coded structure -
> address of the first mount in the set is stored in ->s_mounts
> and ->mnt_instance replaced with ->mnt_next_for_sb and ->mnt_pprev_for_sb -
> the former either NULL or pointing to the next mount in set, the
> latter - address of either ->s_mounts or ->mnt_next_for_sb in the
> previous element of the set.
> 
> In the next commit we'll steal the LSB of ->mnt_pprev_for_sb as
> replacement for MNT_WRITE_HOLD.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 63/63] WRITE_HOLD machinery: no need for to bump mount_lock seqcount
  2025-08-28 23:08     ` [PATCH v2 63/63] WRITE_HOLD machinery: no need for to bump mount_lock seqcount Al Viro
@ 2025-09-01 11:28       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-09-01 11:28 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:08:06AM +0100, Al Viro wrote:
> ... neither for insertion into the list of instances, nor for
> mnt_{un,}hold_writers(), nor for mnt_get_write_access() deciding
> to be nice to RT during a busy-wait loop - all of that only needs
> the spinlock side of mount_lock.
> 
> IOW, it's mount_locked_reader, not mount_writer.
> 
> Clarify the comment re locking rules for mnt_unhold_writers() - it's
> not just that mount_lock needs to be held when calling that, it must
> have been held all along since the matching mnt_hold_writers().
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 54/63] open_detached_copy(): don't bother with mount_lock_hash()
  2025-08-28 23:07     ` [PATCH v2 54/63] open_detached_copy(): don't bother with mount_lock_hash() Al Viro
@ 2025-09-01 11:29       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-09-01 11:29 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:07:57AM +0100, Al Viro wrote:
> we are holding namespace_sem and a reference to root of tree;
> iterating through that tree does not need mount_lock.  Neither
> does the insertion into the rbtree of new namespace or incrementing
> the mount count of that namespace.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 26/63] do_new_mount_rc(): use __free() to deal with dropping mnt on failure
  2025-08-28 23:07     ` [PATCH v2 26/63] do_new_mount_rc(): use __free() to deal with dropping mnt on failure Al Viro
@ 2025-09-01 11:34       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-09-01 11:34 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:07:29AM +0100, Al Viro wrote:
> do_add_mount() consumes vfsmount on success; just follow it with
> conditional retain_and_null_ptr() on success and we can switch
> to __free() for mnt and be done with that - unlock_mount() is
> in the very end.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 28/63] change calling conventions for lock_mount() et.al.
  2025-08-28 23:07     ` [PATCH v2 28/63] change calling conventions for lock_mount() et.al Al Viro
@ 2025-09-01 11:37       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-09-01 11:37 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:07:31AM +0100, Al Viro wrote:
> 1) pinned_mountpoint gets a new member - struct mount *parent.
> Set only if we locked the sucker; ERR_PTR() - on failed attempt.
> 
> 2) do_lock_mount() et.al. return void and set ->parent to
> 	* on success with !beneath - mount corresponding to path->mnt
> 	* on success with beneath - the parent of mount corresponding
> to path->mnt
> 	* in case of error - ERR_PTR(-E...).
> IOW, we get the mount we will be actually mounting upon or ERR_PTR().
> 
> 3) we can't use CLASS, since the pinned_mountpoint is placed on
> hlist during initialization, so we define local macros:
> 	LOCK_MOUNT(mp, path)
> 	LOCK_MOUNT_MAYBE_BENEATH(mp, path, beneath)
> 	LOCK_MOUNT_EXACT(mp, path)
> All of them declare and initialize struct pinned_mountpoint mp,
> with unlock_mount done via __cleanup().
> 
> Users converted.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

This is nice! Thanks!
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 29/63] do_move_mount(): use the parent mount returned by do_lock_mount()
  2025-08-28 23:07     ` [PATCH v2 29/63] do_move_mount(): use the parent mount returned by do_lock_mount() Al Viro
@ 2025-09-01 11:38       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-09-01 11:38 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:07:32AM +0100, Al Viro wrote:
> After successful do_lock_mount() call, mp.parent is set to either
> real_mount(path->mnt) (for !beneath case) or to ->mnt_parent of that
> (for beneath).  p is set to real_mount(path->mnt) and after
> several uses it's made equal to mp.parent.  All uses prior to that
> care only about p->mnt_ns and since p->mnt_ns == parent->mnt_ns,
> we might as well use mp.parent all along.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 30/63] do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path
  2025-08-28 23:07     ` [PATCH v2 30/63] do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path Al Viro
@ 2025-09-01 11:40       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-09-01 11:40 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:07:33AM +0100, Al Viro wrote:
> Both callers pass it a mountpoint reference picked from pinned_mountpoint
> and path it corresponds to.
> 
> First of all, path->dentry is equal to mp.mp->m_dentry.  Furthermore, path->mnt
> is &mp.parent->mnt, making struct path contents redundant.
> 
> Pass it the address of that pinned_mountpoint instead; what's more, if we
> teach it to treat ERR_PTR(error) in ->parent as "bail out with that error"
> we can simplify the callers even more - do_add_mount() will do the right
> thing even when called after lock_mount() failure.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 31/63] graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint
  2025-08-28 23:07     ` [PATCH v2 31/63] graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint Al Viro
@ 2025-09-01 11:41       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-09-01 11:41 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:07:34AM +0100, Al Viro wrote:
> parent and mountpoint always come from the same struct pinned_mountpoint
> now.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 58/63] copy_mnt_ns(): use guards
  2025-08-28 23:08     ` [PATCH v2 58/63] copy_mnt_ns(): use guards Al Viro
@ 2025-09-01 11:43       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-09-01 11:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:08:01AM +0100, Al Viro wrote:
> * mntput() of rootmnt and pwdmnt done via __free(mntput)
> * mnt_ns_tree_add() can be done within namespace_excl scope.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/namespace.c | 17 ++++-------------
>  1 file changed, 4 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index a418555586ef..9e16231d4561 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -4164,7 +4164,8 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
>  		struct user_namespace *user_ns, struct fs_struct *new_fs)
>  {
>  	struct mnt_namespace *new_ns;
> -	struct vfsmount *rootmnt = NULL, *pwdmnt = NULL;
> +	struct vfsmount *rootmnt __free(mntput) = NULL;
> +	struct vfsmount *pwdmnt __free(mntput) = NULL;
>  	struct mount *p, *q;
>  	struct mount *old;
>  	struct mount *new;
> @@ -4183,7 +4184,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
>  	if (IS_ERR(new_ns))
>  		return new_ns;
>  
> -	namespace_lock();
> +	guard(namespace_excl)();
>  	/* First pass: copy the tree topology */
>  	copy_flags = CL_COPY_UNBINDABLE | CL_EXPIRE;
>  	if (user_ns != ns->user_ns)
> @@ -4191,13 +4192,11 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
>  	new = copy_tree(old, old->mnt.mnt_root, copy_flags);
>  	if (IS_ERR(new)) {
>  		emptied_ns = new_ns;
> -		namespace_unlock();
>  		return ERR_CAST(new);
>  	}
>  	if (user_ns != ns->user_ns) {
> -		lock_mount_hash();
> +		guard(mount_writer)();
>  		lock_mnt_tree(new);
> -		unlock_mount_hash();
>  	}
>  	new_ns->root = new;
>  
> @@ -4229,14 +4228,6 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
>  		while (p->mnt.mnt_root != q->mnt.mnt_root)
>  			p = next_mnt(skip_mnt_tree(p), old);
>  	}
> -	namespace_unlock();
> -
> -	if (rootmnt)
> -		mntput(rootmnt);
> -	if (pwdmnt)
> -		mntput(pwdmnt);
> -
> -	mnt_ns_tree_add(new_ns);

The commit message states that "mnt_ns_tree_add() can be done within
namespace_excl scope" suggesting that all this does is to widen the
scope of the lock. But this change also removes the call to
mnt_ns_tree_add() completely? Intentional?

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 51/63] umount_tree(): take all victims out of propagation graph at once
  2025-08-28 23:07     ` [PATCH v2 51/63] umount_tree(): take all victims out of propagation graph at once Al Viro
@ 2025-09-01 11:50       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-09-01 11:50 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:07:54AM +0100, Al Viro wrote:
> For each removed mount we need to calculate where the slaves will end up.
> To avoid duplicating that work, do it for all mounts to be removed
> at once, taking the mounts themselves out of propagation graph as
> we go, then do all transfers; the duplicate work on finding destinations
> is avoided since if we run into a mount that already had destination found,
> we don't need to trace the rest of the way.  That's guaranteed
> O(removed mounts) for finding destinations and removing from propagation
> graph and O(surviving mounts that have master removed) for transfers.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCH v2 35/63] do_lock_mount(): don't modify path.
  2025-08-28 23:07     ` [PATCH v2 35/63] do_lock_mount(): don't modify path Al Viro
@ 2025-09-02 10:55       ` Christian Brauner
  0 siblings, 0 replies; 320+ messages in thread
From: Christian Brauner @ 2025-09-02 10:55 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, jack, torvalds

On Fri, Aug 29, 2025 at 12:07:38AM +0100, Al Viro wrote:
> Currently do_lock_mount() has the target path switched to whatever
> might be overmounting it.  We _do_ want to have the parent
> mount/mountpoint chosen on top of the overmounting pile; however,
> the way it's done has unpleasant races - if umount propagation
> removes the overmount while we'd been trying to set the environment
> up, we might end up failing if our target path strays into that overmount
> just before the overmount gets kicked out.
> 
> Users of do_lock_mount() do not need the target path changed - they
> have all information in res->{parent,mp}; only one place (in
> do_move_mount()) currently uses the resulting path->mnt, and that value
> is trivial to reconstruct by the original value of path->mnt + chosen
> parent mount.
> 
> Let's keep the target path unchanged; it avoids a bunch of subtle races
> and it's not hard to do:
> 	do
> 		as mount_locked_reader
> 			find the prospective parent mount/mountpoint dentry
> 			grab references if it's not the original target
> 		lock the prospective mountpoint dentry
> 		take namespace_sem exclusive
> 		if prospective parent/mountpoint would be different now
> 			err = -EAGAIN
> 		else if location has been unmounted
> 			err = -ENOENT
> 		else if mountpoint dentry is not allowed to be mounted on
> 			err = -ENOENT
> 		else if beneath and the top of the pile was the absolute root
> 			err = -EINVAL
> 		else
> 			try to get struct mountpoint (by dentry), set
> 			err to 0 on success and -ENO{MEM,ENT} on failure
> 		if err != 0
> 			res->parent = ERR_PTR(err)
> 			drop locks
> 		else
> 			res->parent = prospective parent
> 		drop temporary references
> 	while err == -EAGAIN
> 
> A somewhat subtle part is that dropping temporary references is allowed.
> Neither mounts nor dentries should be evicted by a thread that holds
> namespace_sem.  On success we are dropping those references under
> namespace_sem, so we need to be sure that these are not the last
> references remaining.  However, on success we'd already verified (under
> namespace_sem) that original target is still mounted and that mount
> and dentry we are about to drop are still reachable from it via the
> mount tree.  That guarantees that we are not about to drop the last
> remaining references.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/namespace.c | 126 ++++++++++++++++++++++++++-----------------------
>  1 file changed, 68 insertions(+), 58 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index ebecb03972c5..b77d2df606a1 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2727,6 +2727,27 @@ static int attach_recursive_mnt(struct mount *source_mnt,
>  	return err;
>  }
>  
> +static inline struct mount *where_to_mount(const struct path *path,
> +					   struct dentry **dentry,
> +					   bool beneath)
> +{
> +	struct mount *m;
> +
> +	if (unlikely(beneath)) {
> +		m = topmost_overmount(real_mount(path->mnt));
> +		*dentry = m->mnt_mountpoint;
> +		return m->mnt_parent;

No need for that else. This can just be:

if (unlikely(beneath)) {
	m = topmost_overmount(real_mount(path->mnt));
	*dentry = m->mnt_mountpoint;
	return m->mnt_parent;
}

m = __lookup_mnt(path->mnt, *dentry = path->dentry);
if (unlikely(m)) {
	m = topmost_overmount(m);
	*dentry = m->mnt.mnt_root;
	return m;
}
return real_mount(path->mnt);

> +	} else {
> +		m = __lookup_mnt(path->mnt, *dentry = path->dentry);

The assignment to *dentry during argument passing looks really weird.
I would prefer if we didn't do that.

> +		if (unlikely(m)) {
> +			m = topmost_overmount(m);
> +			*dentry = m->mnt.mnt_root;
> +			return m;
> +		}
> +		return real_mount(path->mnt);
> +	}
> +}
> +
>  /**
>   * do_lock_mount - acquire environment for mounting
>   * @path:	target path
> @@ -2758,84 +2779,69 @@ static int attach_recursive_mnt(struct mount *source_mnt,
>   * case we also require the location to be at the root of a mount
>   * that has a parent (i.e. is not a root of some namespace).
>   */
> -static void do_lock_mount(struct path *path, struct pinned_mountpoint *res, bool beneath)
> +static void do_lock_mount(const struct path *path,
> +			  struct pinned_mountpoint *res,
> +			  bool beneath)
>  {
> -	struct vfsmount *mnt = path->mnt;
> -	struct dentry *dentry;
> -	struct path under = {};
> -	int err = -ENOENT;
> +	int err;
>  
>  	if (unlikely(beneath) && !path_mounted(path)) {
>  		res->parent = ERR_PTR(-EINVAL);
>  		return;
>  	}
>  
> -	for (;;) {
> -		struct mount *m = real_mount(mnt);
> -
> -		if (beneath) {
> -			path_put(&under);
> -			read_seqlock_excl(&mount_lock);
> -			if (unlikely(!mnt_has_parent(m))) {
> -				read_sequnlock_excl(&mount_lock);
> -				res->parent = ERR_PTR(-EINVAL);
> -				return;
> +	do {
> +		struct dentry *dentry, *d;
> +		struct mount *m, *n;
> +
> +		scoped_guard(mount_locked_reader) {
> +			m = where_to_mount(path, &dentry, beneath);
> +			if (&m->mnt != path->mnt) {
> +				mntget(&m->mnt);
> +				dget(dentry);
>  			}
> -			under.mnt = mntget(&m->mnt_parent->mnt);
> -			under.dentry = dget(m->mnt_mountpoint);
> -			read_sequnlock_excl(&mount_lock);
> -			dentry = under.dentry;
> -		} else {
> -			dentry = path->dentry;
>  		}
>  
>  		inode_lock(dentry->d_inode);
>  		namespace_lock();
>  
> -		if (unlikely(cant_mount(dentry) || !is_mounted(mnt)))
> -			break;		// not to be mounted on
> +		// check if the chain of mounts (if any) has changed.
> +		scoped_guard(mount_locked_reader)
> +			n = where_to_mount(path, &d, beneath);
>  
> -		if (beneath && unlikely(m->mnt_mountpoint != dentry ||
> -				        &m->mnt_parent->mnt != under.mnt)) {
> -			namespace_unlock();
> -			inode_unlock(dentry->d_inode);
> -			continue;	// got moved
> -		}
> +		if (unlikely(n != m || dentry != d))
> +			err = -EAGAIN;		// something moved, retry
> +		else if (unlikely(cant_mount(dentry) || !is_mounted(path->mnt)))
> +			err = -ENOENT;		// not to be mounted on
> +		else if (beneath && &m->mnt == path->mnt && !m->overmount)
> +			err = -EINVAL;
> +		else
> +			err = get_mountpoint(dentry, res);
>  
> -		mnt = lookup_mnt(path);
> -		if (unlikely(mnt)) {
> +		if (unlikely(err)) {
> +			res->parent = ERR_PTR(err);
>  			namespace_unlock();
>  			inode_unlock(dentry->d_inode);
> -			path_put(path);
> -			path->mnt = mnt;
> -			path->dentry = dget(mnt->mnt_root);
> -			continue;	// got overmounted
> +		} else {
> +			res->parent = m;
>  		}
> -		err = get_mountpoint(dentry, res);
> -		if (err)
> -			break;
> -		if (beneath) {
> -			/*
> -			 * @under duplicates the references that will stay
> -			 * at least until namespace_unlock(), so the path_put()
> -			 * below is safe (and OK to do under namespace_lock -
> -			 * we are not dropping the final references here).
> -			 */
> -			path_put(&under);
> -			res->parent = real_mount(path->mnt)->mnt_parent;
> -			return;
> +		/*
> +		 * Drop the temporary references.  This is subtle - on success
> +		 * we are doing that under namespace_sem, which would normally
> +		 * be forbidden.  However, in that case we are guaranteed that
> +		 * refcounts won't reach zero, since we know that path->mnt
> +		 * is mounted and thus all mounts reachable from it are pinned

"is mounted and we hold the namespace semaphore and thus all mounts
reachable [...]"

With these things fixed:

Reviewed-by: Christian Brauner <brauner@kernel.org>

Unless I forgot something this means I should've gone through the whole
series.

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [RFC] does # really need to be escaped in devnames?
  2025-08-30 19:40                         ` Linus Torvalds
  2025-08-30 20:42                           ` Al Viro
@ 2025-09-02 15:03                           ` Siddhesh Poyarekar
  2025-09-02 16:30                             ` Linus Torvalds
  2025-09-02 17:48                             ` David Howells
  1 sibling, 2 replies; 320+ messages in thread
From: Siddhesh Poyarekar @ 2025-09-02 15:03 UTC (permalink / raw)
  To: Linus Torvalds, Al Viro
  Cc: linux-fsdevel, jack, Ian Kent, David Howells, Christian Brauner

On 2025-08-30 15:40, Linus Torvalds wrote:
> On Sat, 30 Aug 2025 at 00:33, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> So...  Siddhesh, could you clarify the claim about breaking getmntent(3)?
>> Does it or does it not happen on every system that has readonly AFS
>> volumes mounted?
> 
> Hmm. Looking at various source trees using Debian code search, at
> least dietlibc doesn't treat '#' specially at all.
> 
> And glibc seems to treat only a line that *starts* with a '#'
> (possibly preceded by space/tab combinations) as an empty line.
> 
> klibc checks for '#' at the beginning of the file (without any
> potential space skipping before)
> 
> Busybox seems to do the same "skip whitespace, then skip lines
> starting with '#'" that glibc does.
> 
> So I think the '#'-escaping logic is wrong.  We should only escape '#'
> marks at the beginning of a line (since we already escape spaces and
> tabs, the "preceded by whitespace" doesn't matter).


This was actually the original issue I had tried to address, escaping 
'#' in the beginning of the devname because it ends up in the beginning 
of the line, thus masking out the entire line in mounts.  I don't 
remember at what point I concluded that escaping '#' always was the 
answer (maybe to protect against any future instances where userspace 
ends up ignoring the rest of the line following the '#'), but it appears 
to be wrong.

Sid

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [RFC] does # really need to be escaped in devnames?
  2025-09-02 15:03                           ` Siddhesh Poyarekar
@ 2025-09-02 16:30                             ` Linus Torvalds
  2025-09-02 16:39                               ` Siddhesh Poyarekar
  2025-09-02 17:48                             ` David Howells
  1 sibling, 1 reply; 320+ messages in thread
From: Linus Torvalds @ 2025-09-02 16:30 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: Al Viro, linux-fsdevel, jack, Ian Kent, David Howells,
	Christian Brauner

On Tue, 2 Sept 2025 at 08:03, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>
> This was actually the original issue I had tried to address, escaping
> '#' in the beginning of the devname because it ends up in the beginning
> of the line, thus masking out the entire line in mounts.  I don't
> remember at what point I concluded that escaping '#' always was the
> answer (maybe to protect against any future instances where userspace
> ends up ignoring the rest of the line following the '#'), but it appears
> to be wrong.

I wonder if instead of escaping hash-marks we could just disallow them
as the first character in devname.

How did this issue with hash-marks get found? Is there some real use -
in which case we obviously can't disallow them - or was this from some
fuzzing test that happened to hit it?

            Linus

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [RFC] does # really need to be escaped in devnames?
  2025-09-02 16:30                             ` Linus Torvalds
@ 2025-09-02 16:39                               ` Siddhesh Poyarekar
  0 siblings, 0 replies; 320+ messages in thread
From: Siddhesh Poyarekar @ 2025-09-02 16:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, linux-fsdevel, jack, Ian Kent, David Howells,
	Christian Brauner

On 2025-09-02 12:30, Linus Torvalds wrote:
> On Tue, 2 Sept 2025 at 08:03, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>>
>> This was actually the original issue I had tried to address, escaping
>> '#' in the beginning of the devname because it ends up in the beginning
>> of the line, thus masking out the entire line in mounts.  I don't
>> remember at what point I concluded that escaping '#' always was the
>> answer (maybe to protect against any future instances where userspace
>> ends up ignoring the rest of the line following the '#'), but it appears
>> to be wrong.
> 
> I wonder if instead of escaping hash-marks we could just disallow them
> as the first character in devname.
> 
> How did this issue with hash-marks get found? Is there some real use -
> in which case we obviously can't disallow them - or was this from some
> fuzzing test that happened to hit it?

The original issue was that devname being blank broke parsing of mounts, 
which was fixed with Ian's patch[1].  While debugging that issue I 
stumbled onto the fact that if the devname started with #, it would make 
the mount invisible to getmntent in glibc, since it ignores lines 
starting with #.

Sid

[1] https://lkml.org/lkml/2022/6/17/27

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [RFC] does # really need to be escaped in devnames?
  2025-09-02 15:03                           ` Siddhesh Poyarekar
  2025-09-02 16:30                             ` Linus Torvalds
@ 2025-09-02 17:48                             ` David Howells
  2025-09-02 20:04                               ` Linus Torvalds
  1 sibling, 1 reply; 320+ messages in thread
From: David Howells @ 2025-09-02 17:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Siddhesh Poyarekar, Al Viro, linux-fsdevel, jack,
	Ian Kent, Christian Brauner, Jeffrey Altman, linux-afs

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, 2 Sept 2025 at 08:03, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
> >
> > This was actually the original issue I had tried to address, escaping
> > '#' in the beginning of the devname because it ends up in the beginning
> > of the line, thus masking out the entire line in mounts.  I don't
> > remember at what point I concluded that escaping '#' always was the
> > answer (maybe to protect against any future instances where userspace
> > ends up ignoring the rest of the line following the '#'), but it appears
> > to be wrong.
> 
> I wonder if instead of escaping hash-marks we could just disallow them
> as the first character in devname.

The problem with that is that it appears that people are making use of this.

Mount /afs with "-o dynroot" isn't a problem as that shouldn't be given a
device name - and that's the main way people access AFS.  With OpenAFS I don't
think you can do this at all since it has a single superblock that it crams
everything under.  For AuriStor, I think you can mount individual volumes, but
I'm not sure how it works.  For Linux's AFS, I made every volume have its own
superblock.

The standard format of AFS volume names is [%#][<cell>:]<volume-name-or-id>
but I could make it an option to stick something on the front and use that
internally and display that in /proc/mounts, e.g.:

	mount afs:#openafs.org:afs.root /mnt

which would at least mean that sh and bash wouldn't need the "#" escaping.

The problem is that the # and the % have specific documented meanings, so if I
was to get rid of the '#' entirely, I would need some other marker.  Maybe it
would be sufficient to just go on the presence or not of a '%'.

Maybe I could go with something like:

	openafs.org:root.cell:ro
	openafs.org:root.cell:rw
	openafs.org:root.cell:bak

rather than use #/%.

I don't think there should be a problem with still accepting lines beginning
with '#' in mount() if I display them with an appropriate prefix.  That would
at least permit backward compatibility.

David

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [RFC] does # really need to be escaped in devnames?
  2025-09-02 17:48                             ` David Howells
@ 2025-09-02 20:04                               ` Linus Torvalds
  0 siblings, 0 replies; 320+ messages in thread
From: Linus Torvalds @ 2025-09-02 20:04 UTC (permalink / raw)
  To: David Howells
  Cc: Siddhesh Poyarekar, Al Viro, linux-fsdevel, jack, Ian Kent,
	Christian Brauner, Jeffrey Altman, linux-afs

[-- Attachment #1: Type: text/plain, Size: 1275 bytes --]

On Tue, 2 Sept 2025 at 10:48, David Howells <dhowells@redhat.com> wrote:
>
> The problem with that is that it appears that people are making use of this.

Ok. So disallowing it isn't in the cards, but let's try to minimize the impact.

> The standard format of AFS volume names is [%#][<cell>:]<volume-name-or-id>
> but I could make it an option to stick something on the front and use that
> internally and display that in /proc/mounts, e.g.:
>
>         mount afs:#openafs.org:afs.root /mnt

Yeah, let's aim for trying to avoid the '#' at the beginning when all
possible, by trying to make at least the default formats not start
with a hash.

And then make the escaping logic only escape the hashmark if it's the
first character.

> I don't think there should be a problem with still accepting lines beginning
> with '#' in mount() if I display them with an appropriate prefix.  That would
> at least permit backward compatibility.

Well, right now we obviously escape it everywhere, but how about we
make it the rule that 'show_devname()' at least doesn't use it as the
first character, and then if somebody uses '#' for the mount name from
user space, we would just do the octal-escape then.

Something ENTIRELY UNTESTED like this, in other words?

                Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1344 bytes --]

 fs/afs/super.c      | 2 +-
 fs/proc_namespace.c | 9 +++++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/afs/super.c b/fs/afs/super.c
index da407f2d6f0d..31f9cc30ae23 100644
--- a/fs/afs/super.c
+++ b/fs/afs/super.c
@@ -180,7 +180,7 @@ static int afs_show_devname(struct seq_file *m, struct dentry *root)
 		break;
 	}
 
-	seq_printf(m, "%c%s:%s%s", pref, cell->name, volume->name, suf);
+	seq_printf(m, "afs-%c%s:%s%s", pref, cell->name, volume->name, suf);
 	return 0;
 }
 
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 5c555db68aa2..ca5773bfb98e 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -86,7 +86,7 @@ static void show_vfsmnt_opts(struct seq_file *m, struct vfsmount *mnt)
 
 static inline void mangle(struct seq_file *m, const char *s)
 {
-	seq_escape(m, s, " \t\n\\#");
+	seq_escape(m, s, " \t\n\\");
 }
 
 static void show_type(struct seq_file *m, struct super_block *sb)
@@ -111,7 +111,12 @@ static int show_vfsmnt(struct seq_file *m, struct vfsmount *mnt)
 		if (err)
 			goto out;
 	} else {
-		mangle(m, r->mnt_devname);
+		const char *mnt_devname = r->mnt_devname;
+		if (*mnt_devname == '#') {
+			seq_printf(m, "\\%o", '#');
+			mnt_devname++;
+		}
+		mangle(m, mnt_devname);
 	}
 	seq_putc(m, ' ');
 	/* mountpoints outside of chroot jail will give SEQ_SKIP on this */

^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess
  2025-09-03  4:54   ` [PATCHES v3][RFC][CFT] mount-related stuff Al Viro
@ 2025-09-03  4:54     ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 02/65] introduced guards for mount_lock Al Viro
                         ` (74 more replies)
  2025-09-03  5:08     ` [PATCHES v3][RFC][CFT] mount-related stuff Al Viro
  2025-09-03 14:47     ` Linus Torvalds
  2 siblings, 75 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

If anything, namespace_lock should be DEFINE_LOCK_GUARD_0, not DEFINE_GUARD.
That way we
	* do not need to feed it a bogus argument
	* do not get gcc trying to compare an address of static in
file variable with -4097 - and, if we are unlucky, trying to keep
it in a register, with spills and all such.

The same problems apply to grabbing namespace_sem shared.

Rename it to namespace_excl, add namespace_shared, convert the existing users:

    guard(namespace_lock, &namespace_sem) => guard(namespace_excl)()
    guard(rwsem_read, &namespace_sem) => guard(namespace_shared)()
    scoped_guard(namespace_lock, &namespace_sem) => scoped_guard(namespace_excl)
    scoped_guard(rwsem_read, &namespace_sem) => scoped_guard(namespace_shared)

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index ae6d1312b184..fcea65587ff9 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -82,6 +82,12 @@ static LIST_HEAD(ex_mountpoints); /* protected by namespace_sem */
 static struct mnt_namespace *emptied_ns; /* protected by namespace_sem */
 static DEFINE_SEQLOCK(mnt_ns_tree_lock);
 
+static inline void namespace_lock(void);
+static void namespace_unlock(void);
+DEFINE_LOCK_GUARD_0(namespace_excl, namespace_lock(), namespace_unlock())
+DEFINE_LOCK_GUARD_0(namespace_shared, down_read(&namespace_sem),
+				      up_read(&namespace_sem))
+
 #ifdef CONFIG_FSNOTIFY
 LIST_HEAD(notify_list); /* protected by namespace_sem */
 #endif
@@ -1776,8 +1782,6 @@ static inline void namespace_lock(void)
 	down_write(&namespace_sem);
 }
 
-DEFINE_GUARD(namespace_lock, struct rw_semaphore *, namespace_lock(), namespace_unlock())
-
 enum umount_tree_flags {
 	UMOUNT_SYNC = 1,
 	UMOUNT_PROPAGATE = 2,
@@ -2306,7 +2310,7 @@ struct path *collect_paths(const struct path *path,
 	struct path *res = prealloc, *to_free = NULL;
 	unsigned n = 0;
 
-	guard(rwsem_read)(&namespace_sem);
+	guard(namespace_shared)();
 
 	if (!check_mnt(root))
 		return ERR_PTR(-EINVAL);
@@ -2361,7 +2365,7 @@ void dissolve_on_fput(struct vfsmount *mnt)
 			return;
 	}
 
-	scoped_guard(namespace_lock, &namespace_sem) {
+	scoped_guard(namespace_excl) {
 		if (!anon_ns_root(m))
 			return;
 
@@ -2435,7 +2439,7 @@ struct vfsmount *clone_private_mount(const struct path *path)
 	struct mount *old_mnt = real_mount(path->mnt);
 	struct mount *new_mnt;
 
-	guard(rwsem_read)(&namespace_sem);
+	guard(namespace_shared)();
 
 	if (IS_MNT_UNBINDABLE(old_mnt))
 		return ERR_PTR(-EINVAL);
@@ -5957,7 +5961,7 @@ SYSCALL_DEFINE4(statmount, const struct mnt_id_req __user *, req,
 	if (ret)
 		return ret;
 
-	scoped_guard(rwsem_read, &namespace_sem)
+	scoped_guard(namespace_shared)
 		ret = do_statmount(ks, kreq.mnt_id, kreq.mnt_ns_id, ns);
 
 	if (!ret)
@@ -6079,7 +6083,7 @@ SYSCALL_DEFINE4(listmount, const struct mnt_id_req __user *, req,
 	 * We only need to guard against mount topology changes as
 	 * listmount() doesn't care about any mount properties.
 	 */
-	scoped_guard(rwsem_read, &namespace_sem)
+	scoped_guard(namespace_shared)
 		ret = do_listmount(ns, kreq.mnt_id, last_mnt_id, kmnt_ids,
 				   nr_mnt_ids, (flags & LISTMOUNT_REVERSE));
 	if (ret <= 0)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 02/65] introduced guards for mount_lock
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 03/65] fs/namespace.c: allow to drop vfsmount references via __free(mntput) Al Viro
                         ` (73 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

mount_writer: write_seqlock; that's an equivalent of {un,}lock_mount_hash()
mount_locked_reader: read_seqlock_excl; these tend to be open-coded.

No bulk conversions, please - if nothing else, quite a few places take
use mount_writer form when mount_locked_reader is sufficent.  It needs
to be dealt with carefully.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/mount.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/mount.h b/fs/mount.h
index 97737051a8b9..ed8c83ba836a 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -154,6 +154,11 @@ static inline void get_mnt_ns(struct mnt_namespace *ns)
 
 extern seqlock_t mount_lock;
 
+DEFINE_LOCK_GUARD_0(mount_writer, write_seqlock(&mount_lock),
+		    write_sequnlock(&mount_lock))
+DEFINE_LOCK_GUARD_0(mount_locked_reader, read_seqlock_excl(&mount_lock),
+		    read_sequnlock_excl(&mount_lock))
+
 struct proc_mounts {
 	struct mnt_namespace *ns;
 	struct path root;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 03/65] fs/namespace.c: allow to drop vfsmount references via __free(mntput)
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
  2025-09-03  4:54       ` [PATCH v3 02/65] introduced guards for mount_lock Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 04/65] __detach_mounts(): use guards Al Viro
                         ` (72 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Note that just as path_put, it should never be done in scope of
namespace_sem, be it shared or exclusive.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index fcea65587ff9..767ab751ee2a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -88,6 +88,8 @@ DEFINE_LOCK_GUARD_0(namespace_excl, namespace_lock(), namespace_unlock())
 DEFINE_LOCK_GUARD_0(namespace_shared, down_read(&namespace_sem),
 				      up_read(&namespace_sem))
 
+DEFINE_FREE(mntput, struct vfsmount *, if (!IS_ERR(_T)) mntput(_T))
+
 #ifdef CONFIG_FSNOTIFY
 LIST_HEAD(notify_list); /* protected by namespace_sem */
 #endif
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 04/65] __detach_mounts(): use guards
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
  2025-09-03  4:54       ` [PATCH v3 02/65] introduced guards for mount_lock Al Viro
  2025-09-03  4:54       ` [PATCH v3 03/65] fs/namespace.c: allow to drop vfsmount references via __free(mntput) Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 05/65] __is_local_mountpoint(): " Al Viro
                         ` (71 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Clean fit for guards use; guards can't be weaker due to umount_tree() calls.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 767ab751ee2a..1ae1ab8815c9 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2032,10 +2032,11 @@ void __detach_mounts(struct dentry *dentry)
 	struct pinned_mountpoint mp = {};
 	struct mount *mnt;
 
-	namespace_lock();
-	lock_mount_hash();
+	guard(namespace_excl)();
+	guard(mount_writer)();
+
 	if (!lookup_mountpoint(dentry, &mp))
-		goto out_unlock;
+		return;
 
 	event++;
 	while (mp.node.next) {
@@ -2047,9 +2048,6 @@ void __detach_mounts(struct dentry *dentry)
 		else umount_tree(mnt, UMOUNT_CONNECTED);
 	}
 	unpin_mountpoint(&mp);
-out_unlock:
-	unlock_mount_hash();
-	namespace_unlock();
 }
 
 /*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 05/65] __is_local_mountpoint(): use guards
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (2 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 04/65] __detach_mounts(): use guards Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 06/65] do_change_type(): " Al Viro
                         ` (70 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

clean fit; namespace_shared due to iterating through ns->mounts.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 1ae1ab8815c9..f1460ddd1486 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -906,17 +906,14 @@ bool __is_local_mountpoint(const struct dentry *dentry)
 {
 	struct mnt_namespace *ns = current->nsproxy->mnt_ns;
 	struct mount *mnt, *n;
-	bool is_covered = false;
 
-	down_read(&namespace_sem);
-	rbtree_postorder_for_each_entry_safe(mnt, n, &ns->mounts, mnt_node) {
-		is_covered = (mnt->mnt_mountpoint == dentry);
-		if (is_covered)
-			break;
-	}
-	up_read(&namespace_sem);
+	guard(namespace_shared)();
+
+	rbtree_postorder_for_each_entry_safe(mnt, n, &ns->mounts, mnt_node)
+		if (mnt->mnt_mountpoint == dentry)
+			return true;
 
-	return is_covered;
+	return false;
 }
 
 struct pinned_mountpoint {
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 06/65] do_change_type(): use guards
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (3 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 05/65] __is_local_mountpoint(): " Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 07/65] do_set_group(): " Al Viro
                         ` (69 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

clean fit; namespace_excl to modify propagation graph

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index f1460ddd1486..a6a7b068770a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2899,7 +2899,7 @@ static int do_change_type(struct path *path, int ms_flags)
 	struct mount *mnt = real_mount(path->mnt);
 	int recurse = ms_flags & MS_REC;
 	int type;
-	int err = 0;
+	int err;
 
 	if (!path_mounted(path))
 		return -EINVAL;
@@ -2908,23 +2908,22 @@ static int do_change_type(struct path *path, int ms_flags)
 	if (!type)
 		return -EINVAL;
 
-	namespace_lock();
+	guard(namespace_excl)();
+
 	err = may_change_propagation(mnt);
 	if (err)
-		goto out_unlock;
+		return err;
 
 	if (type == MS_SHARED) {
 		err = invent_group_ids(mnt, recurse);
 		if (err)
-			goto out_unlock;
+			return err;
 	}
 
 	for (m = mnt; m; m = (recurse ? next_mnt(m, mnt) : NULL))
 		change_mnt_propagation(m, type);
 
- out_unlock:
-	namespace_unlock();
-	return err;
+	return 0;
 }
 
 /* may_copy_tree() - check if a mount tree can be copied
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 07/65] do_set_group(): use guards
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (4 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 06/65] do_change_type(): " Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 08/65] mark_mounts_for_expiry(): " Al Viro
                         ` (68 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

clean fit; namespace_excl to modify propagation graph

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a6a7b068770a..13e2f3837a26 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3349,47 +3349,44 @@ static inline int tree_contains_unbindable(struct mount *mnt)
 
 static int do_set_group(struct path *from_path, struct path *to_path)
 {
-	struct mount *from, *to;
+	struct mount *from = real_mount(from_path->mnt);
+	struct mount *to = real_mount(to_path->mnt);
 	int err;
 
-	from = real_mount(from_path->mnt);
-	to = real_mount(to_path->mnt);
-
-	namespace_lock();
+	guard(namespace_excl)();
 
 	err = may_change_propagation(from);
 	if (err)
-		goto out;
+		return err;
 	err = may_change_propagation(to);
 	if (err)
-		goto out;
+		return err;
 
-	err = -EINVAL;
 	/* To and From paths should be mount roots */
 	if (!path_mounted(from_path))
-		goto out;
+		return -EINVAL;
 	if (!path_mounted(to_path))
-		goto out;
+		return -EINVAL;
 
 	/* Setting sharing groups is only allowed across same superblock */
 	if (from->mnt.mnt_sb != to->mnt.mnt_sb)
-		goto out;
+		return -EINVAL;
 
 	/* From mount root should be wider than To mount root */
 	if (!is_subdir(to->mnt.mnt_root, from->mnt.mnt_root))
-		goto out;
+		return -EINVAL;
 
 	/* From mount should not have locked children in place of To's root */
 	if (__has_locked_children(from, to->mnt.mnt_root))
-		goto out;
+		return -EINVAL;
 
 	/* Setting sharing groups is only allowed on private mounts */
 	if (IS_MNT_SHARED(to) || IS_MNT_SLAVE(to))
-		goto out;
+		return -EINVAL;
 
 	/* From should not be private */
 	if (!IS_MNT_SHARED(from) && !IS_MNT_SLAVE(from))
-		goto out;
+		return -EINVAL;
 
 	if (IS_MNT_SLAVE(from)) {
 		hlist_add_behind(&to->mnt_slave, &from->mnt_slave);
@@ -3401,11 +3398,7 @@ static int do_set_group(struct path *from_path, struct path *to_path)
 		list_add(&to->mnt_share, &from->mnt_share);
 		set_mnt_shared(to);
 	}
-
-	err = 0;
-out:
-	namespace_unlock();
-	return err;
+	return 0;
 }
 
 /**
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 08/65] mark_mounts_for_expiry(): use guards
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (5 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 07/65] do_set_group(): " Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 09/65] put_mnt_ns(): " Al Viro
                         ` (67 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Clean fit; guards can't be weaker due to umount_tree() calls.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 13e2f3837a26..898a6b7307e4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3886,8 +3886,8 @@ void mark_mounts_for_expiry(struct list_head *mounts)
 	if (list_empty(mounts))
 		return;
 
-	namespace_lock();
-	lock_mount_hash();
+	guard(namespace_excl)();
+	guard(mount_writer)();
 
 	/* extract from the expiration list every vfsmount that matches the
 	 * following criteria:
@@ -3909,8 +3909,6 @@ void mark_mounts_for_expiry(struct list_head *mounts)
 		touch_mnt_namespace(mnt->mnt_ns);
 		umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC);
 	}
-	unlock_mount_hash();
-	namespace_unlock();
 }
 
 EXPORT_SYMBOL_GPL(mark_mounts_for_expiry);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 09/65] put_mnt_ns(): use guards
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (6 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 08/65] mark_mounts_for_expiry(): " Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 10/65] mnt_already_visible(): " Al Viro
                         ` (66 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

clean fit; guards can't be weaker due to umount_tree() call.
Setting emptied_ns requires namespace_excl, but not anything
mount_lock-related.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 898a6b7307e4..86a86be2b0ef 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6153,12 +6153,10 @@ void put_mnt_ns(struct mnt_namespace *ns)
 {
 	if (!refcount_dec_and_test(&ns->ns.count))
 		return;
-	namespace_lock();
+	guard(namespace_excl)();
 	emptied_ns = ns;
-	lock_mount_hash();
+	guard(mount_writer)();
 	umount_tree(ns->root, 0);
-	unlock_mount_hash();
-	namespace_unlock();
 }
 
 struct vfsmount *kern_mount(struct file_system_type *type)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 10/65] mnt_already_visible(): use guards
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (7 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 09/65] put_mnt_ns(): " Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 11/65] check_for_nsfs_mounts(): no need to take locks Al Viro
                         ` (65 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

clean fit; namespace_shared due to iterating through ns->mounts.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 86a86be2b0ef..a5d37b97088f 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6232,9 +6232,8 @@ static bool mnt_already_visible(struct mnt_namespace *ns,
 {
 	int new_flags = *new_mnt_flags;
 	struct mount *mnt, *n;
-	bool visible = false;
 
-	down_read(&namespace_sem);
+	guard(namespace_shared)();
 	rbtree_postorder_for_each_entry_safe(mnt, n, &ns->mounts, mnt_node) {
 		struct mount *child;
 		int mnt_flags;
@@ -6281,13 +6280,10 @@ static bool mnt_already_visible(struct mnt_namespace *ns,
 		/* Preserve the locked attributes */
 		*new_mnt_flags |= mnt_flags & (MNT_LOCK_READONLY | \
 					       MNT_LOCK_ATIME);
-		visible = true;
-		goto found;
+		return true;
 	next:	;
 	}
-found:
-	up_read(&namespace_sem);
-	return visible;
+	return false;
 }
 
 static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCHES v3][RFC][CFT] mount-related stuff
  2025-08-28 23:07 ` [PATCHES v2][RFC][CFT] " Al Viro
  2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
@ 2025-09-03  4:54   ` Al Viro
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                       ` (2 more replies)
  1 sibling, 3 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Linus Torvalds, Christian Brauner, Jan Kara

Branch force-pushed into
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.mount
(also visible as #v3.mount, #v[12].mount being the previous versions)
Individual patches in followups.

If nobody objects, this goes into #for-next.

Changes since v2 (other than applied r-b):

	#26: typo fix in description (do_new_mount_rc -> do_new_mount_fc)
	#33, #35: massage suggested by Linus
	#35: where_to_mount() massage, more or less along the lines of
Christian's suggestion.
	between #52 and #53: document locking in patch_check_mount() and use
guard() in its caller (path_has_submounts()).
	#56 (now #57): fixed editing braino in commit message
	#58 (now #59): restored lost mnt_ns_tree_add()
	#59..63 (now #60..64): rewritten (as posted last week)
	added in the end of the series: constify {__,}mnt_is_readonly()

Diffstat:
 fs/dcache.c                   |   4 +-
 fs/ecryptfs/dentry.c          |  14 +-
 fs/ecryptfs/ecryptfs_kernel.h |  27 +-
 fs/ecryptfs/file.c            |  15 +-
 fs/ecryptfs/inode.c           |  19 +-
 fs/ecryptfs/main.c            |  24 +-
 fs/internal.h                 |   4 +-
 fs/mount.h                    |  39 +-
 fs/namespace.c                | 992 ++++++++++++++++++++----------------------
 fs/pnode.c                    |  75 +++-
 fs/pnode.h                    |   1 +
 fs/super.c                    |   3 +-
 include/linux/fs.h            |   4 +-
 include/linux/mount.h         |   9 +-
 kernel/audit_tree.c           |  12 +-
 15 files changed, 603 insertions(+), 639 deletions(-)

^ permalink raw reply	[flat|nested] 320+ messages in thread

* [PATCH v3 11/65] check_for_nsfs_mounts(): no need to take locks
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (8 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 10/65] mnt_already_visible(): " Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 12/65] propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint() Al Viro
                         ` (64 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Currently we are taking mount_writer; what that function needs is
either mount_locked_reader (we are not changing anything, we just
want to iterate through the subtree) or namespace_shared and
a reference held by caller on the root of subtree - that's also
enough to stabilize the topology.

The thing is, all callers are already holding at least namespace_shared
as well as a reference to the root of subtree.

Let's make the callers provide locking warranties - don't mess with
mount_lock in check_for_nsfs_mounts() itself and document the locking
requirements.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 16 +++++-----------
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a5d37b97088f..59948cbf9c47 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2402,21 +2402,15 @@ bool has_locked_children(struct mount *mnt, struct dentry *dentry)
  * specified subtree.  Such references can act as pins for mount namespaces
  * that aren't checked by the mount-cycle checking code, thereby allowing
  * cycles to be made.
+ *
+ * locks: mount_locked_reader || namespace_shared && pinned(subtree)
  */
 static bool check_for_nsfs_mounts(struct mount *subtree)
 {
-	struct mount *p;
-	bool ret = false;
-
-	lock_mount_hash();
-	for (p = subtree; p; p = next_mnt(p, subtree))
+	for (struct mount *p = subtree; p; p = next_mnt(p, subtree))
 		if (mnt_ns_loop(p->mnt.mnt_root))
-			goto out;
-
-	ret = true;
-out:
-	unlock_mount_hash();
-	return ret;
+			return false;
+	return true;
 }
 
 /**
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 12/65] propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint()
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (9 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 11/65] check_for_nsfs_mounts(): no need to take locks Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 13/65] has_locked_children(): use guards Al Viro
                         ` (63 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/pnode.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/pnode.c b/fs/pnode.c
index 6f7d02f3fa98..0702d45d856d 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -304,9 +304,8 @@ int propagate_mnt(struct mount *dest_mnt, struct mountpoint *dest_mp,
 				err = PTR_ERR(this);
 				break;
 			}
-			read_seqlock_excl(&mount_lock);
-			mnt_set_mountpoint(n, dest_mp, this);
-			read_sequnlock_excl(&mount_lock);
+			scoped_guard(mount_locked_reader)
+				mnt_set_mountpoint(n, dest_mp, this);
 			if (n->mnt_master)
 				SET_MNT_MARK(n->mnt_master);
 			copy = this;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 13/65] has_locked_children(): use guards
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (10 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 12/65] propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint() Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 14/65] mnt_set_expiry(): " Al Viro
                         ` (62 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... and document the locking requirements of __has_locked_children()

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 59948cbf9c47..2cb3cb8307ca 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2373,6 +2373,7 @@ void dissolve_on_fput(struct vfsmount *mnt)
 	}
 }
 
+/* locks: namespace_shared && pinned(mnt) || mount_locked_reader */
 static bool __has_locked_children(struct mount *mnt, struct dentry *dentry)
 {
 	struct mount *child;
@@ -2389,12 +2390,8 @@ static bool __has_locked_children(struct mount *mnt, struct dentry *dentry)
 
 bool has_locked_children(struct mount *mnt, struct dentry *dentry)
 {
-	bool res;
-
-	read_seqlock_excl(&mount_lock);
-	res = __has_locked_children(mnt, dentry);
-	read_sequnlock_excl(&mount_lock);
-	return res;
+	guard(mount_locked_reader)();
+	return __has_locked_children(mnt, dentry);
 }
 
 /*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 14/65] mnt_set_expiry(): use guards
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (11 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 13/65] has_locked_children(): use guards Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 15/65] path_is_under(): " Al Viro
                         ` (61 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

The reason why it needs only mount_locked_reader is that there's no lockless
accesses of expiry lists.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2cb3cb8307ca..db25c81d7f68 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3858,9 +3858,8 @@ int finish_automount(struct vfsmount *m, const struct path *path)
  */
 void mnt_set_expiry(struct vfsmount *mnt, struct list_head *expiry_list)
 {
-	read_seqlock_excl(&mount_lock);
+	guard(mount_locked_reader)();
 	list_add_tail(&real_mount(mnt)->mnt_expire, expiry_list);
-	read_sequnlock_excl(&mount_lock);
 }
 EXPORT_SYMBOL(mnt_set_expiry);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 15/65] path_is_under(): use guards
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (12 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 14/65] mnt_set_expiry(): " Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 16/65] current_chrooted(): don't bother with follow_down_one() Al Viro
                         ` (60 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... and document that locking requirements for is_path_reachable().
There is one questionable caller in do_listmount() where we are not
holding mount_lock *and* might not have the first argument mounted.
However, in that case it will immediately return true without having
to look at the ancestors.  Might be cleaner to move the check into
non-LSTM_ROOT case which it really belongs in - there the check is
not always true and is_mounted() is guaranteed.

Document the locking environments for is_path_reachable() callers:
	get_peer_under_root()
	get_dominating_id()
	do_statmount()
	do_listmount()

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 11 +++++------
 fs/pnode.c     |  3 ++-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index db25c81d7f68..6aabf0045389 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4592,7 +4592,7 @@ SYSCALL_DEFINE5(move_mount,
 /*
  * Return true if path is reachable from root
  *
- * namespace_sem or mount_lock is held
+ * locks: mount_locked_reader || namespace_shared && is_mounted(mnt)
  */
 bool is_path_reachable(struct mount *mnt, struct dentry *dentry,
 			 const struct path *root)
@@ -4606,11 +4606,8 @@ bool is_path_reachable(struct mount *mnt, struct dentry *dentry,
 
 bool path_is_under(const struct path *path1, const struct path *path2)
 {
-	bool res;
-	read_seqlock_excl(&mount_lock);
-	res = is_path_reachable(real_mount(path1->mnt), path1->dentry, path2);
-	read_sequnlock_excl(&mount_lock);
-	return res;
+	guard(mount_locked_reader)();
+	return is_path_reachable(real_mount(path1->mnt), path1->dentry, path2);
 }
 EXPORT_SYMBOL(path_is_under);
 
@@ -5689,6 +5686,7 @@ static int grab_requested_root(struct mnt_namespace *ns, struct path *root)
 			     STATMOUNT_MNT_UIDMAP | \
 			     STATMOUNT_MNT_GIDMAP)
 
+/* locks: namespace_shared */
 static int do_statmount(struct kstatmount *s, u64 mnt_id, u64 mnt_ns_id,
 			struct mnt_namespace *ns)
 {
@@ -5949,6 +5947,7 @@ SYSCALL_DEFINE4(statmount, const struct mnt_id_req __user *, req,
 	return ret;
 }
 
+/* locks: namespace_shared */
 static ssize_t do_listmount(struct mnt_namespace *ns, u64 mnt_parent_id,
 			    u64 last_mnt_id, u64 *mnt_ids, size_t nr_mnt_ids,
 			    bool reverse)
diff --git a/fs/pnode.c b/fs/pnode.c
index 0702d45d856d..edaf9d9d0eaf 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -29,6 +29,7 @@ static inline struct mount *next_slave(struct mount *p)
 	return hlist_entry(p->mnt_slave.next, struct mount, mnt_slave);
 }
 
+/* locks: namespace_shared && is_mounted(mnt) */
 static struct mount *get_peer_under_root(struct mount *mnt,
 					 struct mnt_namespace *ns,
 					 const struct path *root)
@@ -50,7 +51,7 @@ static struct mount *get_peer_under_root(struct mount *mnt,
  * Get ID of closest dominating peer group having a representative
  * under the given root.
  *
- * Caller must hold namespace_sem
+ * locks: namespace_shared
  */
 int get_dominating_id(struct mount *mnt, const struct path *root)
 {
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 16/65] current_chrooted(): don't bother with follow_down_one()
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (13 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 15/65] path_is_under(): " Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 17/65] current_chrooted(): use guards Al Viro
                         ` (59 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

All we need here is to follow ->overmount on root mount of namespace...

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 6aabf0045389..cf680fbf015e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6194,24 +6194,22 @@ bool our_mnt(struct vfsmount *mnt)
 bool current_chrooted(void)
 {
 	/* Does the current process have a non-standard root */
-	struct path ns_root;
+	struct mount *root = current->nsproxy->mnt_ns->root;
 	struct path fs_root;
 	bool chrooted;
 
+	get_fs_root(current->fs, &fs_root);
+
 	/* Find the namespace root */
-	ns_root.mnt = &current->nsproxy->mnt_ns->root->mnt;
-	ns_root.dentry = ns_root.mnt->mnt_root;
-	path_get(&ns_root);
-	while (d_mountpoint(ns_root.dentry) && follow_down_one(&ns_root))
-		;
+	read_seqlock_excl(&mount_lock);
 
-	get_fs_root(current->fs, &fs_root);
+	while (unlikely(root->overmount))
+		root = root->overmount;
 
-	chrooted = !path_equal(&fs_root, &ns_root);
+	chrooted = fs_root.mnt != &root->mnt || !path_mounted(&fs_root);
 
+	read_sequnlock_excl(&mount_lock);
 	path_put(&fs_root);
-	path_put(&ns_root);
-
 	return chrooted;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 17/65] current_chrooted(): use guards
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (14 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 16/65] current_chrooted(): don't bother with follow_down_one() Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 18/65] switch do_new_mount_fc() to fc_mount() Al Viro
                         ` (58 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

here a use of __free(path_put) for dropping fs_root is enough to
make guard(mount_locked_reader) fit...

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index cf680fbf015e..0474b3a93dbf 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -6194,23 +6194,20 @@ bool our_mnt(struct vfsmount *mnt)
 bool current_chrooted(void)
 {
 	/* Does the current process have a non-standard root */
-	struct mount *root = current->nsproxy->mnt_ns->root;
-	struct path fs_root;
-	bool chrooted;
+	struct path fs_root __free(path_put) = {};
+	struct mount *root;
 
 	get_fs_root(current->fs, &fs_root);
 
 	/* Find the namespace root */
-	read_seqlock_excl(&mount_lock);
 
+	guard(mount_locked_reader)();
+
+	root = current->nsproxy->mnt_ns->root;
 	while (unlikely(root->overmount))
 		root = root->overmount;
 
-	chrooted = fs_root.mnt != &root->mnt || !path_mounted(&fs_root);
-
-	read_sequnlock_excl(&mount_lock);
-	path_put(&fs_root);
-	return chrooted;
+	return fs_root.mnt != &root->mnt || !path_mounted(&fs_root);
 }
 
 static bool mnt_already_visible(struct mnt_namespace *ns,
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 18/65] switch do_new_mount_fc() to fc_mount()
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (15 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 17/65] current_chrooted(): use guards Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 19/65] do_move_mount(): trim local variables Al Viro
                         ` (57 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Prior to the call of do_new_mount_fc() the caller has just done successful
vfs_get_tree().  Then do_new_mount_fc() does several checks on resulting
superblock, and either does fc_drop_locked() and returns an error or
proceeds to unlock the superblock and call vfs_create_mount().

The thing is, there's no reason to delay that unlock + vfs_create_mount() -
the tests do not rely upon the state of ->s_umount and
	fc_drop_locked()
	put_fs_context()
is equivalent to
	unlock ->s_umount
	put_fs_context()

Doing vfs_create_mount() before the checks allows us to move vfs_get_tree()
from caller to do_new_mount_fc() and collapse it with vfs_create_mount()
into an fc_mount() call.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 29 ++++++++++++-----------------
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 0474b3a93dbf..9b575c9eee0b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3705,25 +3705,20 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
 static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 			   unsigned int mnt_flags)
 {
-	struct vfsmount *mnt;
 	struct pinned_mountpoint mp = {};
-	struct super_block *sb = fc->root->d_sb;
+	struct super_block *sb;
+	struct vfsmount *mnt = fc_mount(fc);
 	int error;
 
+	if (IS_ERR(mnt))
+		return PTR_ERR(mnt);
+
+	sb = fc->root->d_sb;
 	error = security_sb_kern_mount(sb);
 	if (!error && mount_too_revealing(sb, &mnt_flags))
 		error = -EPERM;
-
-	if (unlikely(error)) {
-		fc_drop_locked(fc);
-		return error;
-	}
-
-	up_write(&sb->s_umount);
-
-	mnt = vfs_create_mount(fc);
-	if (IS_ERR(mnt))
-		return PTR_ERR(mnt);
+	if (unlikely(error))
+		goto out;
 
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
@@ -3731,10 +3726,12 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 	if (!error) {
 		error = do_add_mount(real_mount(mnt), mp.mp,
 				     mountpoint, mnt_flags);
+		if (!error)
+			mnt = NULL;	// consumed on success
 		unlock_mount(&mp);
 	}
-	if (error < 0)
-		mntput(mnt);
+out:
+	mntput(mnt);
 	return error;
 }
 
@@ -3788,8 +3785,6 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 		err = parse_monolithic_mount_data(fc, data);
 	if (!err && !mount_capable(fc))
 		err = -EPERM;
-	if (!err)
-		err = vfs_get_tree(fc);
 	if (!err)
 		err = do_new_mount_fc(fc, path, mnt_flags);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 19/65] do_move_mount(): trim local variables
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (16 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 18/65] switch do_new_mount_fc() to fc_mount() Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 20/65] do_move_mount(): deal with the checks on old_path early Al Viro
                         ` (56 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Both 'parent' and 'ns' are used at most once, no point precalculating those...

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 9b575c9eee0b..ad9b5687ff15 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3564,10 +3564,8 @@ static inline bool may_use_mount(struct mount *mnt)
 static int do_move_mount(struct path *old_path,
 			 struct path *new_path, enum mnt_tree_flags_t flags)
 {
-	struct mnt_namespace *ns;
 	struct mount *p;
 	struct mount *old;
-	struct mount *parent;
 	struct pinned_mountpoint mp;
 	int err;
 	bool beneath = flags & MNT_TREE_BENEATH;
@@ -3578,8 +3576,6 @@ static int do_move_mount(struct path *old_path,
 
 	old = real_mount(old_path->mnt);
 	p = real_mount(new_path->mnt);
-	parent = old->mnt_parent;
-	ns = old->mnt_ns;
 
 	err = -EINVAL;
 
@@ -3588,12 +3584,12 @@ static int do_move_mount(struct path *old_path,
 		/* ... it should be detachable from parent */
 		if (!mnt_has_parent(old) || IS_MNT_LOCKED(old))
 			goto out;
+		/* ... which should not be shared */
+		if (IS_MNT_SHARED(old->mnt_parent))
+			goto out;
 		/* ... and the target should be in our namespace */
 		if (!check_mnt(p))
 			goto out;
-		/* parent of the source should not be shared */
-		if (IS_MNT_SHARED(parent))
-			goto out;
 	} else {
 		/*
 		 * otherwise the source must be the root of some anon namespace.
@@ -3605,7 +3601,7 @@ static int do_move_mount(struct path *old_path,
 		 * subsequent checks would've rejected that, but they lose
 		 * some corner cases if we check it early.
 		 */
-		if (ns == p->mnt_ns)
+		if (old->mnt_ns == p->mnt_ns)
 			goto out;
 		/*
 		 * Target should be either in our namespace or in an acceptable
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 20/65] do_move_mount(): deal with the checks on old_path early
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (17 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 19/65] do_move_mount(): trim local variables Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 21/65] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() Al Viro
                         ` (55 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

1) checking that location we want to move does point to root of some mount
can be done before anything else; that property is not going to change
and having it already verified simplifies the analysis.

2) checking the type agreement between what we are trying to move and what
we are trying to move it onto also belongs in the very beginning -
do_lock_mount() might end up switching new_path to something that overmounts
the original location, but... the same type agreement applies to overmounts,
so we could just as well check against the original location.

3) since we know that old_path->dentry is the root of old_path->mnt, there's
no point bothering with path_is_overmounted() in can_move_mount_beneath();
it's simply a check for the mount we are trying to move having non-NULL
->overmount.  And with that, we can switch can_move_mount_beneath() to
taking old instead of old_path, leaving no uses of old_path past the original
checks.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index ad9b5687ff15..74c67ea1b5a8 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3433,7 +3433,7 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
 
 /**
  * can_move_mount_beneath - check that we can mount beneath the top mount
- * @from: mount to mount beneath
+ * @mnt_from: mount we are trying to move
  * @to:   mount under which to mount
  * @mp:   mountpoint of @to
  *
@@ -3443,7 +3443,7 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  *   root or the rootfs of the namespace.
  * - Make sure that the caller can unmount the topmost mount ensuring
  *   that the caller could reveal the underlying mountpoint.
- * - Ensure that nothing has been mounted on top of @from before we
+ * - Ensure that nothing has been mounted on top of @mnt_from before we
  *   grabbed @namespace_sem to avoid creating pointless shadow mounts.
  * - Prevent mounting beneath a mount if the propagation relationship
  *   between the source mount, parent mount, and top mount would lead to
@@ -3452,12 +3452,11 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  * Context: This function expects namespace_lock() to be held.
  * Return: On success 0, and on error a negative error code is returned.
  */
-static int can_move_mount_beneath(const struct path *from,
+static int can_move_mount_beneath(struct mount *mnt_from,
 				  const struct path *to,
 				  const struct mountpoint *mp)
 {
-	struct mount *mnt_from = real_mount(from->mnt),
-		     *mnt_to = real_mount(to->mnt),
+	struct mount *mnt_to = real_mount(to->mnt),
 		     *parent_mnt_to = mnt_to->mnt_parent;
 
 	if (!mnt_has_parent(mnt_to))
@@ -3470,7 +3469,7 @@ static int can_move_mount_beneath(const struct path *from,
 		return -EINVAL;
 
 	/* Avoid creating shadow mounts during mount propagation. */
-	if (path_overmounted(from))
+	if (mnt_from->overmount)
 		return -EINVAL;
 
 	/*
@@ -3565,16 +3564,21 @@ static int do_move_mount(struct path *old_path,
 			 struct path *new_path, enum mnt_tree_flags_t flags)
 {
 	struct mount *p;
-	struct mount *old;
+	struct mount *old = real_mount(old_path->mnt);
 	struct pinned_mountpoint mp;
 	int err;
 	bool beneath = flags & MNT_TREE_BENEATH;
 
+	if (!path_mounted(old_path))
+		return -EINVAL;
+
+	if (d_is_dir(new_path->dentry) != d_is_dir(old_path->dentry))
+		return -EINVAL;
+
 	err = do_lock_mount(new_path, &mp, beneath);
 	if (err)
 		return err;
 
-	old = real_mount(old_path->mnt);
 	p = real_mount(new_path->mnt);
 
 	err = -EINVAL;
@@ -3611,15 +3615,8 @@ static int do_move_mount(struct path *old_path,
 			goto out;
 	}
 
-	if (!path_mounted(old_path))
-		goto out;
-
-	if (d_is_dir(new_path->dentry) !=
-	    d_is_dir(old_path->dentry))
-		goto out;
-
 	if (beneath) {
-		err = can_move_mount_beneath(old_path, new_path, mp.mp);
+		err = can_move_mount_beneath(old, new_path, mp.mp);
 		if (err)
 			goto out;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 21/65] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount()
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (18 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 20/65] do_move_mount(): deal with the checks on old_path early Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 22/65] finish_automount(): simplify the ELOOP check Al Viro
                         ` (54 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

We want to mount beneath the given location.  For that operation to
make sense, location must be the root of some mount that has something
under it.  Currently we let it proceed if those requirements are not met,
with rather meaningless results, and have that bogosity caught further
down the road; let's fail early instead - do_lock_mount() doesn't make
sense unless those conditions hold, and checking them there makes
things simpler.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 74c67ea1b5a8..86c6dd432b13 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2768,12 +2768,19 @@ static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bo
 	struct path under = {};
 	int err = -ENOENT;
 
+	if (unlikely(beneath) && !path_mounted(path))
+		return -EINVAL;
+
 	for (;;) {
 		struct mount *m = real_mount(mnt);
 
 		if (beneath) {
 			path_put(&under);
 			read_seqlock_excl(&mount_lock);
+			if (unlikely(!mnt_has_parent(m))) {
+				read_sequnlock_excl(&mount_lock);
+				return -EINVAL;
+			}
 			under.mnt = mntget(&m->mnt_parent->mnt);
 			under.dentry = dget(m->mnt_mountpoint);
 			read_sequnlock_excl(&mount_lock);
@@ -3437,8 +3444,6 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  * @to:   mount under which to mount
  * @mp:   mountpoint of @to
  *
- * - Make sure that @to->dentry is actually the root of a mount under
- *   which we can mount another mount.
  * - Make sure that nothing can be mounted beneath the caller's current
  *   root or the rootfs of the namespace.
  * - Make sure that the caller can unmount the topmost mount ensuring
@@ -3459,12 +3464,6 @@ static int can_move_mount_beneath(struct mount *mnt_from,
 	struct mount *mnt_to = real_mount(to->mnt),
 		     *parent_mnt_to = mnt_to->mnt_parent;
 
-	if (!mnt_has_parent(mnt_to))
-		return -EINVAL;
-
-	if (!path_mounted(to))
-		return -EINVAL;
-
 	if (IS_MNT_LOCKED(mnt_to))
 		return -EINVAL;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 22/65] finish_automount(): simplify the ELOOP check
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (19 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 21/65] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 23/65] do_loopback(): use __free(path_put) to deal with old_path Al Viro
                         ` (53 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

It's enough to check that dentries match; if path->dentry is equal to
m->mnt_root, superblocks will match as well.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 86c6dd432b13..bdb33270ac6e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3798,8 +3798,7 @@ int finish_automount(struct vfsmount *m, const struct path *path)
 
 	mnt = real_mount(m);
 
-	if (m->mnt_sb == path->mnt->mnt_sb &&
-	    m->mnt_root == dentry) {
+	if (m->mnt_root == path->dentry) {
 		err = -ELOOP;
 		goto discard;
 	}
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 23/65] do_loopback(): use __free(path_put) to deal with old_path
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (20 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 22/65] finish_automount(): simplify the ELOOP check Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 24/65] pivot_root(2): use __free() to deal with struct path in it Al Viro
                         ` (52 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

preparations for making unlock_mount() a __cleanup();
can't have path_put() inside mount_lock scope.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index bdb33270ac6e..245cf2d19a6b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3014,7 +3014,7 @@ static struct mount *__do_loopback(struct path *old_path, int recurse)
 static int do_loopback(struct path *path, const char *old_name,
 				int recurse)
 {
-	struct path old_path;
+	struct path old_path __free(path_put) = {};
 	struct mount *mnt = NULL, *parent;
 	struct pinned_mountpoint mp = {};
 	int err;
@@ -3024,13 +3024,12 @@ static int do_loopback(struct path *path, const char *old_name,
 	if (err)
 		return err;
 
-	err = -EINVAL;
 	if (mnt_ns_loop(old_path.dentry))
-		goto out;
+		return -EINVAL;
 
 	err = lock_mount(path, &mp);
 	if (err)
-		goto out;
+		return err;
 
 	parent = real_mount(path->mnt);
 	if (!check_mnt(parent))
@@ -3050,8 +3049,6 @@ static int do_loopback(struct path *path, const char *old_name,
 	}
 out2:
 	unlock_mount(&mp);
-out:
-	path_put(&old_path);
 	return err;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 24/65] pivot_root(2): use __free() to deal with struct path in it
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (21 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 23/65] do_loopback(): use __free(path_put) to deal with old_path Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 25/65] finish_automount(): take the lock_mount() analogue into a helper Al Viro
                         ` (51 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

preparations for making unlock_mount() a __cleanup();
can't have path_put() inside mount_lock scope.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 245cf2d19a6b..90b62ee882da 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4622,7 +4622,9 @@ EXPORT_SYMBOL(path_is_under);
 SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 		const char __user *, put_old)
 {
-	struct path new, old, root;
+	struct path new __free(path_put) = {};
+	struct path old __free(path_put) = {};
+	struct path root __free(path_put) = {};
 	struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent, *ex_parent;
 	struct pinned_mountpoint old_mp = {};
 	int error;
@@ -4633,21 +4635,21 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	error = user_path_at(AT_FDCWD, new_root,
 			     LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &new);
 	if (error)
-		goto out0;
+		return error;
 
 	error = user_path_at(AT_FDCWD, put_old,
 			     LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &old);
 	if (error)
-		goto out1;
+		return error;
 
 	error = security_sb_pivotroot(&old, &new);
 	if (error)
-		goto out2;
+		return error;
 
 	get_fs_root(current->fs, &root);
 	error = lock_mount(&old, &old_mp);
 	if (error)
-		goto out3;
+		return error;
 
 	error = -EINVAL;
 	new_mnt = real_mount(new.mnt);
@@ -4705,13 +4707,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	error = 0;
 out4:
 	unlock_mount(&old_mp);
-out3:
-	path_put(&root);
-out2:
-	path_put(&old);
-out1:
-	path_put(&new);
-out0:
 	return error;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 25/65] finish_automount(): take the lock_mount() analogue into a helper
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (22 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 24/65] pivot_root(2): use __free() to deal with struct path in it Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 26/65] do_new_mount_fc(): use __free() to deal with dropping mnt on failure Al Viro
                         ` (50 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

finish_automount() can't use lock_mount() - it treats finding something
already mounted as "quitely drop our mount and return 0", not as
"mount on top of whatever mounted there".  It's been open-coded;
let's take it into a helper similar to lock_mount().  "something's
already mounted" => -EBUSY, finish_automount() needs to distinguish
it from the normal case and it can't happen in other failure cases.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 90b62ee882da..6251ee15f5f6 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3781,9 +3781,29 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 	return err;
 }
 
-int finish_automount(struct vfsmount *m, const struct path *path)
+static int lock_mount_exact(const struct path *path,
+			    struct pinned_mountpoint *mp)
 {
 	struct dentry *dentry = path->dentry;
+	int err;
+
+	inode_lock(dentry->d_inode);
+	namespace_lock();
+	if (unlikely(cant_mount(dentry)))
+		err = -ENOENT;
+	else if (path_overmounted(path))
+		err = -EBUSY;
+	else
+		err = get_mountpoint(dentry, mp);
+	if (unlikely(err)) {
+		namespace_unlock();
+		inode_unlock(dentry->d_inode);
+	}
+	return err;
+}
+
+int finish_automount(struct vfsmount *m, const struct path *path)
+{
 	struct pinned_mountpoint mp = {};
 	struct mount *mnt;
 	int err;
@@ -3805,20 +3825,11 @@ int finish_automount(struct vfsmount *m, const struct path *path)
 	 * that overmounts our mountpoint to be means "quitely drop what we've
 	 * got", not "try to mount it on top".
 	 */
-	inode_lock(dentry->d_inode);
-	namespace_lock();
-	if (unlikely(cant_mount(dentry))) {
-		err = -ENOENT;
-		goto discard_locked;
-	}
-	if (path_overmounted(path)) {
-		err = 0;
-		goto discard_locked;
+	err = lock_mount_exact(path, &mp);
+	if (unlikely(err)) {
+		mntput(m);
+		return err == -EBUSY ? 0 : err;
 	}
-	err = get_mountpoint(dentry, &mp);
-	if (err)
-		goto discard_locked;
-
 	err = do_add_mount(mnt, mp.mp, path,
 			   path->mnt->mnt_flags | MNT_SHRINKABLE);
 	unlock_mount(&mp);
@@ -3826,9 +3837,6 @@ int finish_automount(struct vfsmount *m, const struct path *path)
 		goto discard;
 	return 0;
 
-discard_locked:
-	namespace_unlock();
-	inode_unlock(dentry->d_inode);
 discard:
 	mntput(m);
 	return err;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 26/65] do_new_mount_fc(): use __free() to deal with dropping mnt on failure
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (23 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 25/65] finish_automount(): take the lock_mount() analogue into a helper Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 26/63] do_new_mount_rc(): " Al Viro
                         ` (49 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

do_add_mount() consumes vfsmount on success; just follow it with
conditional retain_and_null_ptr() on success and we can switch
to __free() for mnt and be done with that - unlock_mount() is
in the very end.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 6251ee15f5f6..3551e51461a2 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3696,7 +3696,7 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 {
 	struct pinned_mountpoint mp = {};
 	struct super_block *sb;
-	struct vfsmount *mnt = fc_mount(fc);
+	struct vfsmount *mnt __free(mntput) = fc_mount(fc);
 	int error;
 
 	if (IS_ERR(mnt))
@@ -3704,10 +3704,11 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 
 	sb = fc->root->d_sb;
 	error = security_sb_kern_mount(sb);
-	if (!error && mount_too_revealing(sb, &mnt_flags))
-		error = -EPERM;
 	if (unlikely(error))
-		goto out;
+		return error;
+
+	if (unlikely(mount_too_revealing(sb, &mnt_flags)))
+		return -EPERM;
 
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
@@ -3716,11 +3717,9 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 		error = do_add_mount(real_mount(mnt), mp.mp,
 				     mountpoint, mnt_flags);
 		if (!error)
-			mnt = NULL;	// consumed on success
+			retain_and_null_ptr(mnt); // consumed on success
 		unlock_mount(&mp);
 	}
-out:
-	mntput(mnt);
 	return error;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 26/63] do_new_mount_rc(): use __free() to deal with dropping mnt on failure
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (24 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 26/65] do_new_mount_fc(): use __free() to deal with dropping mnt on failure Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 27/65] finish_automount(): " Al Viro
                         ` (48 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

do_add_mount() consumes vfsmount on success; just follow it with
conditional retain_and_null_ptr() on success and we can switch
to __free() for mnt and be done with that - unlock_mount() is
in the very end.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 6251ee15f5f6..3551e51461a2 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3696,7 +3696,7 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 {
 	struct pinned_mountpoint mp = {};
 	struct super_block *sb;
-	struct vfsmount *mnt = fc_mount(fc);
+	struct vfsmount *mnt __free(mntput) = fc_mount(fc);
 	int error;
 
 	if (IS_ERR(mnt))
@@ -3704,10 +3704,11 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 
 	sb = fc->root->d_sb;
 	error = security_sb_kern_mount(sb);
-	if (!error && mount_too_revealing(sb, &mnt_flags))
-		error = -EPERM;
 	if (unlikely(error))
-		goto out;
+		return error;
+
+	if (unlikely(mount_too_revealing(sb, &mnt_flags)))
+		return -EPERM;
 
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
@@ -3716,11 +3717,9 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 		error = do_add_mount(real_mount(mnt), mp.mp,
 				     mountpoint, mnt_flags);
 		if (!error)
-			mnt = NULL;	// consumed on success
+			retain_and_null_ptr(mnt); // consumed on success
 		unlock_mount(&mp);
 	}
-out:
-	mntput(mnt);
 	return error;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 27/65] finish_automount(): use __free() to deal with dropping mnt on failure
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (25 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 26/63] do_new_mount_rc(): " Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 28/65] change calling conventions for lock_mount() et.al Al Viro
                         ` (47 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

same story as with do_new_mount_fc().

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 3551e51461a2..779cfed04291 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3801,8 +3801,9 @@ static int lock_mount_exact(const struct path *path,
 	return err;
 }
 
-int finish_automount(struct vfsmount *m, const struct path *path)
+int finish_automount(struct vfsmount *__m, const struct path *path)
 {
+	struct vfsmount *m __free(mntput) = __m;
 	struct pinned_mountpoint mp = {};
 	struct mount *mnt;
 	int err;
@@ -3814,10 +3815,8 @@ int finish_automount(struct vfsmount *m, const struct path *path)
 
 	mnt = real_mount(m);
 
-	if (m->mnt_root == path->dentry) {
-		err = -ELOOP;
-		goto discard;
-	}
+	if (m->mnt_root == path->dentry)
+		return -ELOOP;
 
 	/*
 	 * we don't want to use lock_mount() - in this case finding something
@@ -3825,19 +3824,14 @@ int finish_automount(struct vfsmount *m, const struct path *path)
 	 * got", not "try to mount it on top".
 	 */
 	err = lock_mount_exact(path, &mp);
-	if (unlikely(err)) {
-		mntput(m);
+	if (unlikely(err))
 		return err == -EBUSY ? 0 : err;
-	}
+
 	err = do_add_mount(mnt, mp.mp, path,
 			   path->mnt->mnt_flags | MNT_SHRINKABLE);
+	if (likely(!err))
+		retain_and_null_ptr(m);
 	unlock_mount(&mp);
-	if (unlikely(err))
-		goto discard;
-	return 0;
-
-discard:
-	mntput(m);
 	return err;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 28/65] change calling conventions for lock_mount() et.al.
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (26 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 27/65] finish_automount(): " Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 29/65] do_move_mount(): use the parent mount returned by do_lock_mount() Al Viro
                         ` (46 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

1) pinned_mountpoint gets a new member - struct mount *parent.
Set only if we locked the sucker; ERR_PTR() - on failed attempt.

2) do_lock_mount() et.al. return void and set ->parent to
	* on success with !beneath - mount corresponding to path->mnt
	* on success with beneath - the parent of mount corresponding
to path->mnt
	* in case of error - ERR_PTR(-E...).
IOW, we get the mount we will be actually mounting upon or ERR_PTR().

3) we can't use CLASS, since the pinned_mountpoint is placed on
hlist during initialization, so we define local macros:
	LOCK_MOUNT(mp, path)
	LOCK_MOUNT_MAYBE_BENEATH(mp, path, beneath)
	LOCK_MOUNT_EXACT(mp, path)
All of them declare and initialize struct pinned_mountpoint mp,
with unlock_mount done via __cleanup().

Users converted.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 219 ++++++++++++++++++++++++-------------------------
 1 file changed, 108 insertions(+), 111 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 779cfed04291..952e66bdb9bb 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -919,6 +919,7 @@ bool __is_local_mountpoint(const struct dentry *dentry)
 struct pinned_mountpoint {
 	struct hlist_node node;
 	struct mountpoint *mp;
+	struct mount *parent;
 };
 
 static bool lookup_mountpoint(struct dentry *dentry, struct pinned_mountpoint *m)
@@ -2728,48 +2729,47 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 }
 
 /**
- * do_lock_mount - lock mount and mountpoint
- * @path:    target path
- * @beneath: whether the intention is to mount beneath @path
+ * do_lock_mount - acquire environment for mounting
+ * @path:	target path
+ * @res:	context to set up
+ * @beneath:	whether the intention is to mount beneath @path
  *
- * Follow the mount stack on @path until the top mount @mnt is found. If
- * the initial @path->{mnt,dentry} is a mountpoint lookup the first
- * mount stacked on top of it. Then simply follow @{mnt,mnt->mnt_root}
- * until nothing is stacked on top of it anymore.
+ * To mount something at given location, we need
+ *	namespace_sem locked exclusive
+ *	inode of dentry we are mounting on locked exclusive
+ *	struct mountpoint for that dentry
+ *	struct mount we are mounting on
  *
- * Acquire the inode_lock() on the top mount's ->mnt_root to protect
- * against concurrent removal of the new mountpoint from another mount
- * namespace.
+ * Results are stored in caller-supplied context (pinned_mountpoint);
+ * on success we have res->parent and res->mp pointing to parent and
+ * mountpoint respectively and res->node inserted into the ->m_list
+ * of the mountpoint, making sure the mountpoint won't disappear.
+ * On failure we have res->parent set to ERR_PTR(-E...), res->mp
+ * left NULL, res->node - empty.
+ * In case of success do_lock_mount returns with locks acquired (in
+ * proper order - inode lock nests outside of namespace_sem).
  *
- * If @beneath is requested, acquire inode_lock() on @mnt's mountpoint
- * @mp on @mnt->mnt_parent must be acquired. This protects against a
- * concurrent unlink of @mp->mnt_dentry from another mount namespace
- * where @mnt doesn't have a child mount mounted @mp. A concurrent
- * removal of @mnt->mnt_root doesn't matter as nothing will be mounted
- * on top of it for @beneath.
+ * Request to mount on overmounted location is treated as "mount on
+ * top of whatever's overmounting it"; request to mount beneath
+ * a location - "mount immediately beneath the topmost mount at that
+ * place".
  *
- * In addition, @beneath needs to make sure that @mnt hasn't been
- * unmounted or moved from its current mountpoint in between dropping
- * @mount_lock and acquiring @namespace_sem. For the !@beneath case @mnt
- * being unmounted would be detected later by e.g., calling
- * check_mnt(mnt) in the function it's called from. For the @beneath
- * case however, it's useful to detect it directly in do_lock_mount().
- * If @mnt hasn't been unmounted then @mnt->mnt_mountpoint still points
- * to @mnt->mnt_mp->m_dentry. But if @mnt has been unmounted it will
- * point to @mnt->mnt_root and @mnt->mnt_mp will be NULL.
- *
- * Return: Either the target mountpoint on the top mount or the top
- *         mount's mountpoint.
+ * In all cases the location must not have been unmounted and the
+ * chosen mountpoint must be allowed to be mounted on.  For "beneath"
+ * case we also require the location to be at the root of a mount
+ * that has a parent (i.e. is not a root of some namespace).
  */
-static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bool beneath)
+static void do_lock_mount(struct path *path, struct pinned_mountpoint *res, bool beneath)
 {
 	struct vfsmount *mnt = path->mnt;
 	struct dentry *dentry;
 	struct path under = {};
 	int err = -ENOENT;
 
-	if (unlikely(beneath) && !path_mounted(path))
-		return -EINVAL;
+	if (unlikely(beneath) && !path_mounted(path)) {
+		res->parent = ERR_PTR(-EINVAL);
+		return;
+	}
 
 	for (;;) {
 		struct mount *m = real_mount(mnt);
@@ -2779,7 +2779,8 @@ static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bo
 			read_seqlock_excl(&mount_lock);
 			if (unlikely(!mnt_has_parent(m))) {
 				read_sequnlock_excl(&mount_lock);
-				return -EINVAL;
+				res->parent = ERR_PTR(-EINVAL);
+				return;
 			}
 			under.mnt = mntget(&m->mnt_parent->mnt);
 			under.dentry = dget(m->mnt_mountpoint);
@@ -2811,7 +2812,7 @@ static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bo
 			path->dentry = dget(mnt->mnt_root);
 			continue;	// got overmounted
 		}
-		err = get_mountpoint(dentry, pinned);
+		err = get_mountpoint(dentry, res);
 		if (err)
 			break;
 		if (beneath) {
@@ -2822,22 +2823,25 @@ static int do_lock_mount(struct path *path, struct pinned_mountpoint *pinned, bo
 			 * we are not dropping the final references here).
 			 */
 			path_put(&under);
+			res->parent = real_mount(path->mnt)->mnt_parent;
+			return;
 		}
-		return 0;
+		res->parent = real_mount(path->mnt);
+		return;
 	}
 	namespace_unlock();
 	inode_unlock(dentry->d_inode);
 	if (beneath)
 		path_put(&under);
-	return err;
+	res->parent = ERR_PTR(err);
 }
 
-static inline int lock_mount(struct path *path, struct pinned_mountpoint *m)
+static inline void lock_mount(struct path *path, struct pinned_mountpoint *m)
 {
-	return do_lock_mount(path, m, false);
+	do_lock_mount(path, m, false);
 }
 
-static void unlock_mount(struct pinned_mountpoint *m)
+static void __unlock_mount(struct pinned_mountpoint *m)
 {
 	inode_unlock(m->mp->m_dentry->d_inode);
 	read_seqlock_excl(&mount_lock);
@@ -2846,6 +2850,20 @@ static void unlock_mount(struct pinned_mountpoint *m)
 	namespace_unlock();
 }
 
+static inline void unlock_mount(struct pinned_mountpoint *m)
+{
+	if (!IS_ERR(m->parent))
+		__unlock_mount(m);
+}
+
+#define LOCK_MOUNT_MAYBE_BENEATH(mp, path, beneath) \
+	struct pinned_mountpoint mp __cleanup(unlock_mount) = {}; \
+	do_lock_mount((path), &mp, (beneath))
+#define LOCK_MOUNT(mp, path) LOCK_MOUNT_MAYBE_BENEATH(mp, (path), false)
+#define LOCK_MOUNT_EXACT(mp, path) \
+	struct pinned_mountpoint mp __cleanup(unlock_mount) = {}; \
+	lock_mount_exact((path), &mp)
+
 static int graft_tree(struct mount *mnt, struct mount *p, struct mountpoint *mp)
 {
 	if (mnt->mnt.mnt_sb->s_flags & SB_NOUSER)
@@ -3015,8 +3033,7 @@ static int do_loopback(struct path *path, const char *old_name,
 				int recurse)
 {
 	struct path old_path __free(path_put) = {};
-	struct mount *mnt = NULL, *parent;
-	struct pinned_mountpoint mp = {};
+	struct mount *mnt = NULL;
 	int err;
 	if (!old_name || !*old_name)
 		return -EINVAL;
@@ -3027,28 +3044,23 @@ static int do_loopback(struct path *path, const char *old_name,
 	if (mnt_ns_loop(old_path.dentry))
 		return -EINVAL;
 
-	err = lock_mount(path, &mp);
-	if (err)
-		return err;
+	LOCK_MOUNT(mp, path);
+	if (IS_ERR(mp.parent))
+		return PTR_ERR(mp.parent);
 
-	parent = real_mount(path->mnt);
-	if (!check_mnt(parent))
-		goto out2;
+	if (!check_mnt(mp.parent))
+		return -EINVAL;
 
 	mnt = __do_loopback(&old_path, recurse);
-	if (IS_ERR(mnt)) {
-		err = PTR_ERR(mnt);
-		goto out2;
-	}
+	if (IS_ERR(mnt))
+		return PTR_ERR(mnt);
 
-	err = graft_tree(mnt, parent, mp.mp);
+	err = graft_tree(mnt, mp.parent, mp.mp);
 	if (err) {
 		lock_mount_hash();
 		umount_tree(mnt, UMOUNT_SYNC);
 		unlock_mount_hash();
 	}
-out2:
-	unlock_mount(&mp);
 	return err;
 }
 
@@ -3561,7 +3573,6 @@ static int do_move_mount(struct path *old_path,
 {
 	struct mount *p;
 	struct mount *old = real_mount(old_path->mnt);
-	struct pinned_mountpoint mp;
 	int err;
 	bool beneath = flags & MNT_TREE_BENEATH;
 
@@ -3571,52 +3582,49 @@ static int do_move_mount(struct path *old_path,
 	if (d_is_dir(new_path->dentry) != d_is_dir(old_path->dentry))
 		return -EINVAL;
 
-	err = do_lock_mount(new_path, &mp, beneath);
-	if (err)
-		return err;
+	LOCK_MOUNT_MAYBE_BENEATH(mp, new_path, beneath);
+	if (IS_ERR(mp.parent))
+		return PTR_ERR(mp.parent);
 
 	p = real_mount(new_path->mnt);
 
-	err = -EINVAL;
-
 	if (check_mnt(old)) {
 		/* if the source is in our namespace... */
 		/* ... it should be detachable from parent */
 		if (!mnt_has_parent(old) || IS_MNT_LOCKED(old))
-			goto out;
+			return -EINVAL;
 		/* ... which should not be shared */
 		if (IS_MNT_SHARED(old->mnt_parent))
-			goto out;
+			return -EINVAL;
 		/* ... and the target should be in our namespace */
 		if (!check_mnt(p))
-			goto out;
+			return -EINVAL;
 	} else {
 		/*
 		 * otherwise the source must be the root of some anon namespace.
 		 */
 		if (!anon_ns_root(old))
-			goto out;
+			return -EINVAL;
 		/*
 		 * Bail out early if the target is within the same namespace -
 		 * subsequent checks would've rejected that, but they lose
 		 * some corner cases if we check it early.
 		 */
 		if (old->mnt_ns == p->mnt_ns)
-			goto out;
+			return -EINVAL;
 		/*
 		 * Target should be either in our namespace or in an acceptable
 		 * anon namespace, sensu check_anonymous_mnt().
 		 */
 		if (!may_use_mount(p))
-			goto out;
+			return -EINVAL;
 	}
 
 	if (beneath) {
 		err = can_move_mount_beneath(old, new_path, mp.mp);
 		if (err)
-			goto out;
+			return err;
 
-		err = -EINVAL;
 		p = p->mnt_parent;
 	}
 
@@ -3625,17 +3633,13 @@ static int do_move_mount(struct path *old_path,
 	 * mount which is shared.
 	 */
 	if (IS_MNT_SHARED(p) && tree_contains_unbindable(old))
-		goto out;
-	err = -ELOOP;
+		return -EINVAL;
 	if (!check_for_nsfs_mounts(old))
-		goto out;
+		return -ELOOP;
 	if (mount_is_ancestor(old, p))
-		goto out;
+		return -ELOOP;
 
-	err = attach_recursive_mnt(old, p, mp.mp);
-out:
-	unlock_mount(&mp);
-	return err;
+	return attach_recursive_mnt(old, p, mp.mp);
 }
 
 static int do_move_mount_old(struct path *path, const char *old_name)
@@ -3694,7 +3698,6 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
 static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 			   unsigned int mnt_flags)
 {
-	struct pinned_mountpoint mp = {};
 	struct super_block *sb;
 	struct vfsmount *mnt __free(mntput) = fc_mount(fc);
 	int error;
@@ -3712,13 +3715,14 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
-	error = lock_mount(mountpoint, &mp);
-	if (!error) {
+	LOCK_MOUNT(mp, mountpoint);
+	if (IS_ERR(mp.parent)) {
+		return PTR_ERR(mp.parent);
+	} else {
 		error = do_add_mount(real_mount(mnt), mp.mp,
 				     mountpoint, mnt_flags);
 		if (!error)
 			retain_and_null_ptr(mnt); // consumed on success
-		unlock_mount(&mp);
 	}
 	return error;
 }
@@ -3780,8 +3784,8 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
 	return err;
 }
 
-static int lock_mount_exact(const struct path *path,
-			    struct pinned_mountpoint *mp)
+static void lock_mount_exact(const struct path *path,
+			     struct pinned_mountpoint *mp)
 {
 	struct dentry *dentry = path->dentry;
 	int err;
@@ -3797,14 +3801,15 @@ static int lock_mount_exact(const struct path *path,
 	if (unlikely(err)) {
 		namespace_unlock();
 		inode_unlock(dentry->d_inode);
+		mp->parent = ERR_PTR(err);
+	} else {
+		mp->parent = real_mount(path->mnt);
 	}
-	return err;
 }
 
 int finish_automount(struct vfsmount *__m, const struct path *path)
 {
 	struct vfsmount *m __free(mntput) = __m;
-	struct pinned_mountpoint mp = {};
 	struct mount *mnt;
 	int err;
 
@@ -3823,15 +3828,14 @@ int finish_automount(struct vfsmount *__m, const struct path *path)
 	 * that overmounts our mountpoint to be means "quitely drop what we've
 	 * got", not "try to mount it on top".
 	 */
-	err = lock_mount_exact(path, &mp);
-	if (unlikely(err))
-		return err == -EBUSY ? 0 : err;
+	LOCK_MOUNT_EXACT(mp, path);
+	if (IS_ERR(mp.parent))
+		return mp.parent == ERR_PTR(-EBUSY) ? 0 : PTR_ERR(mp.parent);
 
 	err = do_add_mount(mnt, mp.mp, path,
 			   path->mnt->mnt_flags | MNT_SHRINKABLE);
 	if (likely(!err))
 		retain_and_null_ptr(m);
-	unlock_mount(&mp);
 	return err;
 }
 
@@ -4627,7 +4631,6 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	struct path old __free(path_put) = {};
 	struct path root __free(path_put) = {};
 	struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent, *ex_parent;
-	struct pinned_mountpoint old_mp = {};
 	int error;
 
 	if (!may_mount())
@@ -4648,45 +4651,42 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 		return error;
 
 	get_fs_root(current->fs, &root);
-	error = lock_mount(&old, &old_mp);
-	if (error)
-		return error;
 
-	error = -EINVAL;
+	LOCK_MOUNT(old_mp, &old);
+	old_mnt = old_mp.parent;
+	if (IS_ERR(old_mnt))
+		return PTR_ERR(old_mnt);
+
 	new_mnt = real_mount(new.mnt);
 	root_mnt = real_mount(root.mnt);
-	old_mnt = real_mount(old.mnt);
 	ex_parent = new_mnt->mnt_parent;
 	root_parent = root_mnt->mnt_parent;
 	if (IS_MNT_SHARED(old_mnt) ||
 		IS_MNT_SHARED(ex_parent) ||
 		IS_MNT_SHARED(root_parent))
-		goto out4;
+		return -EINVAL;
 	if (!check_mnt(root_mnt) || !check_mnt(new_mnt))
-		goto out4;
+		return -EINVAL;
 	if (new_mnt->mnt.mnt_flags & MNT_LOCKED)
-		goto out4;
-	error = -ENOENT;
+		return -EINVAL;
 	if (d_unlinked(new.dentry))
-		goto out4;
-	error = -EBUSY;
+		return -ENOENT;
 	if (new_mnt == root_mnt || old_mnt == root_mnt)
-		goto out4; /* loop, on the same file system  */
-	error = -EINVAL;
+		return -EBUSY; /* loop, on the same file system  */
 	if (!path_mounted(&root))
-		goto out4; /* not a mountpoint */
+		return -EINVAL; /* not a mountpoint */
 	if (!mnt_has_parent(root_mnt))
-		goto out4; /* absolute root */
+		return -EINVAL; /* absolute root */
 	if (!path_mounted(&new))
-		goto out4; /* not a mountpoint */
+		return -EINVAL; /* not a mountpoint */
 	if (!mnt_has_parent(new_mnt))
-		goto out4; /* absolute root */
+		return -EINVAL; /* absolute root */
 	/* make sure we can reach put_old from new_root */
 	if (!is_path_reachable(old_mnt, old.dentry, &new))
-		goto out4;
+		return -EINVAL;
 	/* make certain new is below the root */
 	if (!is_path_reachable(new_mnt, new.dentry, &root))
-		goto out4;
+		return -EINVAL;
 	lock_mount_hash();
 	umount_mnt(new_mnt);
 	if (root_mnt->mnt.mnt_flags & MNT_LOCKED) {
@@ -4705,10 +4705,7 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	mnt_notify_add(root_mnt);
 	mnt_notify_add(new_mnt);
 	chroot_fs_refs(&root, &new);
-	error = 0;
-out4:
-	unlock_mount(&old_mp);
-	return error;
+	return 0;
 }
 
 static unsigned int recalc_flags(struct mount_kattr *kattr, struct mount *mnt)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 29/65] do_move_mount(): use the parent mount returned by do_lock_mount()
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (27 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 28/65] change calling conventions for lock_mount() et.al Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 30/65] do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path Al Viro
                         ` (45 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

After successful do_lock_mount() call, mp.parent is set to either
real_mount(path->mnt) (for !beneath case) or to ->mnt_parent of that
(for beneath).  p is set to real_mount(path->mnt) and after
several uses it's made equal to mp.parent.  All uses prior to that
care only about p->mnt_ns and since p->mnt_ns == parent->mnt_ns,
we might as well use mp.parent all along.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 952e66bdb9bb..d57e727962da 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3571,7 +3571,6 @@ static inline bool may_use_mount(struct mount *mnt)
 static int do_move_mount(struct path *old_path,
 			 struct path *new_path, enum mnt_tree_flags_t flags)
 {
-	struct mount *p;
 	struct mount *old = real_mount(old_path->mnt);
 	int err;
 	bool beneath = flags & MNT_TREE_BENEATH;
@@ -3586,8 +3585,6 @@ static int do_move_mount(struct path *old_path,
 	if (IS_ERR(mp.parent))
 		return PTR_ERR(mp.parent);
 
-	p = real_mount(new_path->mnt);
-
 	if (check_mnt(old)) {
 		/* if the source is in our namespace... */
 		/* ... it should be detachable from parent */
@@ -3597,7 +3594,7 @@ static int do_move_mount(struct path *old_path,
 		if (IS_MNT_SHARED(old->mnt_parent))
 			return -EINVAL;
 		/* ... and the target should be in our namespace */
-		if (!check_mnt(p))
+		if (!check_mnt(mp.parent))
 			return -EINVAL;
 	} else {
 		/*
@@ -3610,13 +3607,13 @@ static int do_move_mount(struct path *old_path,
 		 * subsequent checks would've rejected that, but they lose
 		 * some corner cases if we check it early.
 		 */
-		if (old->mnt_ns == p->mnt_ns)
+		if (old->mnt_ns == mp.parent->mnt_ns)
 			return -EINVAL;
 		/*
 		 * Target should be either in our namespace or in an acceptable
 		 * anon namespace, sensu check_anonymous_mnt().
 		 */
-		if (!may_use_mount(p))
+		if (!may_use_mount(mp.parent))
 			return -EINVAL;
 	}
 
@@ -3624,22 +3621,20 @@ static int do_move_mount(struct path *old_path,
 		err = can_move_mount_beneath(old, new_path, mp.mp);
 		if (err)
 			return err;
-
-		p = p->mnt_parent;
 	}
 
 	/*
 	 * Don't move a mount tree containing unbindable mounts to a destination
 	 * mount which is shared.
 	 */
-	if (IS_MNT_SHARED(p) && tree_contains_unbindable(old))
+	if (IS_MNT_SHARED(mp.parent) && tree_contains_unbindable(old))
 		return -EINVAL;
 	if (!check_for_nsfs_mounts(old))
 		return -ELOOP;
-	if (mount_is_ancestor(old, p))
+	if (mount_is_ancestor(old, mp.parent))
 		return -ELOOP;
 
-	return attach_recursive_mnt(old, p, mp.mp);
+	return attach_recursive_mnt(old, mp.parent, mp.mp);
 }
 
 static int do_move_mount_old(struct path *path, const char *old_name)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 30/65] do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (28 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 29/65] do_move_mount(): use the parent mount returned by do_lock_mount() Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 31/65] graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint Al Viro
                         ` (44 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Both callers pass it a mountpoint reference picked from pinned_mountpoint
and path it corresponds to.

First of all, path->dentry is equal to mp.mp->m_dentry.  Furthermore, path->mnt
is &mp.parent->mnt, making struct path contents redundant.

Pass it the address of that pinned_mountpoint instead; what's more, if we
teach it to treat ERR_PTR(error) in ->parent as "bail out with that error"
we can simplify the callers even more - do_add_mount() will do the right
thing even when called after lock_mount() failure.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index d57e727962da..b236536bbbc9 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3657,10 +3657,13 @@ static int do_move_mount_old(struct path *path, const char *old_name)
 /*
  * add a mount into a namespace's mount tree
  */
-static int do_add_mount(struct mount *newmnt, struct mountpoint *mp,
-			const struct path *path, int mnt_flags)
+static int do_add_mount(struct mount *newmnt, const struct pinned_mountpoint *mp,
+			int mnt_flags)
 {
-	struct mount *parent = real_mount(path->mnt);
+	struct mount *parent = mp->parent;
+
+	if (IS_ERR(parent))
+		return PTR_ERR(parent);
 
 	mnt_flags &= ~MNT_INTERNAL_FLAGS;
 
@@ -3674,14 +3677,15 @@ static int do_add_mount(struct mount *newmnt, struct mountpoint *mp,
 	}
 
 	/* Refuse the same filesystem on the same mount point */
-	if (path->mnt->mnt_sb == newmnt->mnt.mnt_sb && path_mounted(path))
+	if (parent->mnt.mnt_sb == newmnt->mnt.mnt_sb &&
+	    parent->mnt.mnt_root == mp->mp->m_dentry)
 		return -EBUSY;
 
 	if (d_is_symlink(newmnt->mnt.mnt_root))
 		return -EINVAL;
 
 	newmnt->mnt.mnt_flags = mnt_flags;
-	return graft_tree(newmnt, parent, mp);
+	return graft_tree(newmnt, parent, mp->mp);
 }
 
 static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags);
@@ -3711,14 +3715,9 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
 	mnt_warn_timestamp_expiry(mountpoint, mnt);
 
 	LOCK_MOUNT(mp, mountpoint);
-	if (IS_ERR(mp.parent)) {
-		return PTR_ERR(mp.parent);
-	} else {
-		error = do_add_mount(real_mount(mnt), mp.mp,
-				     mountpoint, mnt_flags);
-		if (!error)
-			retain_and_null_ptr(mnt); // consumed on success
-	}
+	error = do_add_mount(real_mount(mnt), &mp, mnt_flags);
+	if (!error)
+		retain_and_null_ptr(mnt); // consumed on success
 	return error;
 }
 
@@ -3824,11 +3823,10 @@ int finish_automount(struct vfsmount *__m, const struct path *path)
 	 * got", not "try to mount it on top".
 	 */
 	LOCK_MOUNT_EXACT(mp, path);
-	if (IS_ERR(mp.parent))
-		return mp.parent == ERR_PTR(-EBUSY) ? 0 : PTR_ERR(mp.parent);
+	if (mp.parent == ERR_PTR(-EBUSY))
+		return 0;
 
-	err = do_add_mount(mnt, mp.mp, path,
-			   path->mnt->mnt_flags | MNT_SHRINKABLE);
+	err = do_add_mount(mnt, &mp, path->mnt->mnt_flags | MNT_SHRINKABLE);
 	if (likely(!err))
 		retain_and_null_ptr(m);
 	return err;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 31/65] graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (29 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 30/65] do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 32/65] pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry Al Viro
                         ` (43 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

parent and mountpoint always come from the same struct pinned_mountpoint
now.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index b236536bbbc9..18d6ad0f4f76 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2549,8 +2549,7 @@ enum mnt_tree_flags_t {
 /**
  * attach_recursive_mnt - attach a source mount tree
  * @source_mnt: mount tree to be attached
- * @dest_mnt:   mount that @source_mnt will be mounted on
- * @dest_mp:    the mountpoint @source_mnt will be mounted at
+ * @dest:	the context for mounting at the place where the tree should go
  *
  *  NOTE: in the table below explains the semantics when a source mount
  *  of a given type is attached to a destination mount of a given type.
@@ -2613,10 +2612,11 @@ enum mnt_tree_flags_t {
  *         Otherwise a negative error code is returned.
  */
 static int attach_recursive_mnt(struct mount *source_mnt,
-				struct mount *dest_mnt,
-				struct mountpoint *dest_mp)
+				const struct pinned_mountpoint *dest)
 {
 	struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns;
+	struct mount *dest_mnt = dest->parent;
+	struct mountpoint *dest_mp = dest->mp;
 	HLIST_HEAD(tree_list);
 	struct mnt_namespace *ns = dest_mnt->mnt_ns;
 	struct pinned_mountpoint root = {};
@@ -2864,16 +2864,16 @@ static inline void unlock_mount(struct pinned_mountpoint *m)
 	struct pinned_mountpoint mp __cleanup(unlock_mount) = {}; \
 	lock_mount_exact((path), &mp)
 
-static int graft_tree(struct mount *mnt, struct mount *p, struct mountpoint *mp)
+static int graft_tree(struct mount *mnt, const struct pinned_mountpoint *mp)
 {
 	if (mnt->mnt.mnt_sb->s_flags & SB_NOUSER)
 		return -EINVAL;
 
-	if (d_is_dir(mp->m_dentry) !=
+	if (d_is_dir(mp->mp->m_dentry) !=
 	      d_is_dir(mnt->mnt.mnt_root))
 		return -ENOTDIR;
 
-	return attach_recursive_mnt(mnt, p, mp);
+	return attach_recursive_mnt(mnt, mp);
 }
 
 static int may_change_propagation(const struct mount *m)
@@ -3055,7 +3055,7 @@ static int do_loopback(struct path *path, const char *old_name,
 	if (IS_ERR(mnt))
 		return PTR_ERR(mnt);
 
-	err = graft_tree(mnt, mp.parent, mp.mp);
+	err = graft_tree(mnt, &mp);
 	if (err) {
 		lock_mount_hash();
 		umount_tree(mnt, UMOUNT_SYNC);
@@ -3634,7 +3634,7 @@ static int do_move_mount(struct path *old_path,
 	if (mount_is_ancestor(old, mp.parent))
 		return -ELOOP;
 
-	return attach_recursive_mnt(old, mp.parent, mp.mp);
+	return attach_recursive_mnt(old, &mp);
 }
 
 static int do_move_mount_old(struct path *path, const char *old_name)
@@ -3685,7 +3685,7 @@ static int do_add_mount(struct mount *newmnt, const struct pinned_mountpoint *mp
 		return -EINVAL;
 
 	newmnt->mnt.mnt_flags = mnt_flags;
-	return graft_tree(newmnt, parent, mp->mp);
+	return graft_tree(newmnt, mp);
 }
 
 static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 32/65] pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (30 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 31/65] graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 33/65] don't bother passing new_path->dentry to can_move_mount_beneath() Al Viro
                         ` (42 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

That kills the last place where callers of lock_mount(path, &mp)
used path->dentry.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 18d6ad0f4f76..02bc5294071a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4675,7 +4675,7 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	if (!mnt_has_parent(new_mnt))
 		return -EINVAL; /* absolute root */
 	/* make sure we can reach put_old from new_root */
-	if (!is_path_reachable(old_mnt, old.dentry, &new))
+	if (!is_path_reachable(old_mnt, old_mp.mp->m_dentry, &new))
 		return -EINVAL;
 	/* make certain new is below the root */
 	if (!is_path_reachable(new_mnt, new.dentry, &root))
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 33/65] don't bother passing new_path->dentry to can_move_mount_beneath()
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (31 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 32/65] pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 34/65] new helper: topmost_overmount() Al Viro
                         ` (41 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 02bc5294071a..b81677a4232f 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3450,8 +3450,8 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
 /**
  * can_move_mount_beneath - check that we can mount beneath the top mount
  * @mnt_from: mount we are trying to move
- * @to:   mount under which to mount
- * @mp:   mountpoint of @to
+ * @mnt_to:   mount under which to mount
+ * @mp:   mountpoint of @mnt_to
  *
  * - Make sure that nothing can be mounted beneath the caller's current
  *   root or the rootfs of the namespace.
@@ -3467,11 +3467,10 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  * Return: On success 0, and on error a negative error code is returned.
  */
 static int can_move_mount_beneath(struct mount *mnt_from,
-				  const struct path *to,
+				  struct mount *mnt_to,
 				  const struct mountpoint *mp)
 {
-	struct mount *mnt_to = real_mount(to->mnt),
-		     *parent_mnt_to = mnt_to->mnt_parent;
+	struct mount *parent_mnt_to = mnt_to->mnt_parent;
 
 	if (IS_MNT_LOCKED(mnt_to))
 		return -EINVAL;
@@ -3618,7 +3617,9 @@ static int do_move_mount(struct path *old_path,
 	}
 
 	if (beneath) {
-		err = can_move_mount_beneath(old, new_path, mp.mp);
+		struct mount *over = real_mount(new_path->mnt);
+
+		err = can_move_mount_beneath(old, over, mp.mp);
 		if (err)
 			return err;
 	}
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 34/65] new helper: topmost_overmount()
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (32 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 33/65] don't bother passing new_path->dentry to can_move_mount_beneath() Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 35/65] do_lock_mount(): don't modify path Al Viro
                         ` (40 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Returns the final (topmost) mount in the chain of overmounts
starting at given mount.  Same locking rules as for any mount
tree traversal - either the spinlock side of mount_lock, or
rcu + sample the seqcount side of mount_lock before the call
and recheck afterwards.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/mount.h     | 7 +++++++
 fs/namespace.c | 9 +++------
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index ed8c83ba836a..04d0eadc4c10 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -235,4 +235,11 @@ static inline void mnt_notify_add(struct mount *m)
 }
 #endif
 
+static inline struct mount *topmost_overmount(struct mount *m)
+{
+	while (m->overmount)
+		m = m->overmount;
+	return m;
+}
+
 struct mnt_namespace *mnt_ns_from_dentry(struct dentry *dentry);
diff --git a/fs/namespace.c b/fs/namespace.c
index b81677a4232f..23ef2e56808b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2696,10 +2696,9 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 				 child->mnt_mountpoint);
 		commit_tree(child);
 		if (q) {
+			struct mount *r = topmost_overmount(child);
 			struct mountpoint *mp = root.mp;
-			struct mount *r = child;
-			while (unlikely(r->overmount))
-				r = r->overmount;
+
 			if (unlikely(shorter) && child != source_mnt)
 				mp = shorter;
 			mnt_change_mountpoint(r, mp, q);
@@ -6173,9 +6172,7 @@ bool current_chrooted(void)
 
 	guard(mount_locked_reader)();
 
-	root = current->nsproxy->mnt_ns->root;
-	while (unlikely(root->overmount))
-		root = root->overmount;
+	root = topmost_overmount(current->nsproxy->mnt_ns->root);
 
 	return fs_root.mnt != &root->mnt || !path_mounted(&fs_root);
 }
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 35/65] do_lock_mount(): don't modify path.
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (33 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 34/65] new helper: topmost_overmount() Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 36/65] constify check_mnt() Al Viro
                         ` (39 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Currently do_lock_mount() has the target path switched to whatever
might be overmounting it.  We _do_ want to have the parent
mount/mountpoint chosen on top of the overmounting pile; however,
the way it's done has unpleasant races - if umount propagation
removes the overmount while we'd been trying to set the environment
up, we might end up failing if our target path strays into that overmount
just before the overmount gets kicked out.

Users of do_lock_mount() do not need the target path changed - they
have all information in res->{parent,mp}; only one place (in
do_move_mount()) currently uses the resulting path->mnt, and that value
is trivial to reconstruct by the original value of path->mnt + chosen
parent mount.

Let's keep the target path unchanged; it avoids a bunch of subtle races
and it's not hard to do:
	do
		as mount_locked_reader
			find the prospective parent mount/mountpoint dentry
			grab references if it's not the original target
		lock the prospective mountpoint dentry
		take namespace_sem exclusive
		if prospective parent/mountpoint would be different now
			err = -EAGAIN
		else if location has been unmounted
			err = -ENOENT
		else if mountpoint dentry is not allowed to be mounted on
			err = -ENOENT
		else if beneath and the top of the pile was the absolute root
			err = -EINVAL
		else
			try to get struct mountpoint (by dentry), set
			err to 0 on success and -ENO{MEM,ENT} on failure
		if err != 0
			res->parent = ERR_PTR(err)
			drop locks
		else
			res->parent = prospective parent
		drop temporary references
	while err == -EAGAIN

A somewhat subtle part is that dropping temporary references is allowed.
Neither mounts nor dentries should be evicted by a thread that holds
namespace_sem.  On success we are dropping those references under
namespace_sem, so we need to be sure that these are not the last
references remaining.  However, on success we'd already verified (under
namespace_sem) that original target is still mounted and that mount
and dentry we are about to drop are still reachable from it via the
mount tree.  That guarantees that we are not about to drop the last
remaining references.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 122 ++++++++++++++++++++++++++-----------------------
 1 file changed, 65 insertions(+), 57 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 23ef2e56808b..c2e074f66bd1 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2727,6 +2727,27 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 	return err;
 }
 
+static inline struct mount *where_to_mount(const struct path *path,
+					   struct dentry **dentry,
+					   bool beneath)
+{
+	struct mount *m;
+
+	if (unlikely(beneath)) {
+		m = topmost_overmount(real_mount(path->mnt));
+		*dentry = m->mnt_mountpoint;
+		return m->mnt_parent;
+	}
+	m = __lookup_mnt(path->mnt, path->dentry);
+	if (unlikely(m)) {
+		m = topmost_overmount(m);
+		*dentry = m->mnt.mnt_root;
+		return m;
+	}
+	*dentry = path->dentry;
+	return real_mount(path->mnt);
+}
+
 /**
  * do_lock_mount - acquire environment for mounting
  * @path:	target path
@@ -2758,84 +2779,69 @@ static int attach_recursive_mnt(struct mount *source_mnt,
  * case we also require the location to be at the root of a mount
  * that has a parent (i.e. is not a root of some namespace).
  */
-static void do_lock_mount(struct path *path, struct pinned_mountpoint *res, bool beneath)
+static void do_lock_mount(const struct path *path,
+			  struct pinned_mountpoint *res,
+			  bool beneath)
 {
-	struct vfsmount *mnt = path->mnt;
-	struct dentry *dentry;
-	struct path under = {};
-	int err = -ENOENT;
+	int err;
 
 	if (unlikely(beneath) && !path_mounted(path)) {
 		res->parent = ERR_PTR(-EINVAL);
 		return;
 	}
 
-	for (;;) {
-		struct mount *m = real_mount(mnt);
-
-		if (beneath) {
-			path_put(&under);
-			read_seqlock_excl(&mount_lock);
-			if (unlikely(!mnt_has_parent(m))) {
-				read_sequnlock_excl(&mount_lock);
-				res->parent = ERR_PTR(-EINVAL);
-				return;
+	do {
+		struct dentry *dentry, *d;
+		struct mount *m, *n;
+
+		scoped_guard(mount_locked_reader) {
+			m = where_to_mount(path, &dentry, beneath);
+			if (&m->mnt != path->mnt) {
+				mntget(&m->mnt);
+				dget(dentry);
 			}
-			under.mnt = mntget(&m->mnt_parent->mnt);
-			under.dentry = dget(m->mnt_mountpoint);
-			read_sequnlock_excl(&mount_lock);
-			dentry = under.dentry;
-		} else {
-			dentry = path->dentry;
 		}
 
 		inode_lock(dentry->d_inode);
 		namespace_lock();
 
-		if (unlikely(cant_mount(dentry) || !is_mounted(mnt)))
-			break;		// not to be mounted on
+		// check if the chain of mounts (if any) has changed.
+		scoped_guard(mount_locked_reader)
+			n = where_to_mount(path, &d, beneath);
 
-		if (beneath && unlikely(m->mnt_mountpoint != dentry ||
-				        &m->mnt_parent->mnt != under.mnt)) {
-			namespace_unlock();
-			inode_unlock(dentry->d_inode);
-			continue;	// got moved
-		}
+		if (unlikely(n != m || dentry != d))
+			err = -EAGAIN;		// something moved, retry
+		else if (unlikely(cant_mount(dentry) || !is_mounted(path->mnt)))
+			err = -ENOENT;		// not to be mounted on
+		else if (beneath && &m->mnt == path->mnt && !m->overmount)
+			err = -EINVAL;
+		else
+			err = get_mountpoint(dentry, res);
 
-		mnt = lookup_mnt(path);
-		if (unlikely(mnt)) {
+		if (unlikely(err)) {
+			res->parent = ERR_PTR(err);
 			namespace_unlock();
 			inode_unlock(dentry->d_inode);
-			path_put(path);
-			path->mnt = mnt;
-			path->dentry = dget(mnt->mnt_root);
-			continue;	// got overmounted
+		} else {
+			res->parent = m;
 		}
-		err = get_mountpoint(dentry, res);
-		if (err)
-			break;
-		if (beneath) {
-			/*
-			 * @under duplicates the references that will stay
-			 * at least until namespace_unlock(), so the path_put()
-			 * below is safe (and OK to do under namespace_lock -
-			 * we are not dropping the final references here).
-			 */
-			path_put(&under);
-			res->parent = real_mount(path->mnt)->mnt_parent;
-			return;
+		/*
+		 * Drop the temporary references.  This is subtle - on success
+		 * we are doing that under namespace_sem, which would normally
+		 * be forbidden.  However, in that case we are guaranteed that
+		 * refcounts won't reach zero, since we know that path->mnt
+		 * is mounted and thus all mounts reachable from it are pinned
+		 * and stable, along with their mountpoints and roots.
+		 */
+		if (&m->mnt != path->mnt) {
+			dput(dentry);
+			mntput(&m->mnt);
 		}
-		res->parent = real_mount(path->mnt);
-		return;
-	}
-	namespace_unlock();
-	inode_unlock(dentry->d_inode);
-	if (beneath)
-		path_put(&under);
-	res->parent = ERR_PTR(err);
+	} while (err == -EAGAIN);
 }
 
-static inline void lock_mount(struct path *path, struct pinned_mountpoint *m)
+static inline void lock_mount(const struct path *path,
+			      struct pinned_mountpoint *m)
 {
 	do_lock_mount(path, m, false);
 }
@@ -3618,6 +3624,8 @@ static int do_move_mount(struct path *old_path,
 	if (beneath) {
 		struct mount *over = real_mount(new_path->mnt);
 
+		if (mp.parent != over->mnt_parent)
+			over = mp.parent->overmount;
 		err = can_move_mount_beneath(old, over, mp.mp);
 		if (err)
 			return err;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 36/65] constify check_mnt()
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (34 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 35/65] do_lock_mount(): don't modify path Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:54       ` [PATCH v3 37/65] do_mount_setattr(): constify path argument Al Viro
                         ` (38 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index c2e074f66bd1..511e49fd7c27 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1010,7 +1010,7 @@ static void unpin_mountpoint(struct pinned_mountpoint *m)
 	}
 }
 
-static inline int check_mnt(struct mount *mnt)
+static inline int check_mnt(const struct mount *mnt)
 {
 	return mnt->mnt_ns == current->nsproxy->mnt_ns;
 }
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 37/65] do_mount_setattr(): constify path argument
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (35 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 36/65] constify check_mnt() Al Viro
@ 2025-09-03  4:54       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 38/65] do_set_group(): constify path arguments Al Viro
                         ` (37 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:54 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 511e49fd7c27..f74a0523194a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4865,7 +4865,7 @@ static void mount_setattr_commit(struct mount_kattr *kattr, struct mount *mnt)
 	touch_mnt_namespace(mnt->mnt_ns);
 }
 
-static int do_mount_setattr(struct path *path, struct mount_kattr *kattr)
+static int do_mount_setattr(const struct path *path, struct mount_kattr *kattr)
 {
 	struct mount *mnt = real_mount(path->mnt);
 	int err = 0;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 38/65] do_set_group(): constify path arguments
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (36 preceding siblings ...)
  2025-09-03  4:54       ` [PATCH v3 37/65] do_mount_setattr(): constify path argument Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 39/65] drop_collected_paths(): constify arguments Al Viro
                         ` (36 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index f74a0523194a..7da3a589c775 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3359,7 +3359,7 @@ static inline int tree_contains_unbindable(struct mount *mnt)
 	return 0;
 }
 
-static int do_set_group(struct path *from_path, struct path *to_path)
+static int do_set_group(const struct path *from_path, const struct path *to_path)
 {
 	struct mount *from = real_mount(from_path->mnt);
 	struct mount *to = real_mount(to_path->mnt);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 39/65] drop_collected_paths(): constify arguments
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (37 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 38/65] do_set_group(): constify path arguments Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 40/65] collect_paths(): constify the return value Al Viro
                         ` (35 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... and use that to constify the pointers in callers

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c        |  4 ++--
 include/linux/mount.h |  2 +-
 kernel/audit_tree.c   | 12 ++++++------
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 7da3a589c775..704eff14735d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2334,9 +2334,9 @@ struct path *collect_paths(const struct path *path,
 	return res;
 }
 
-void drop_collected_paths(struct path *paths, struct path *prealloc)
+void drop_collected_paths(const struct path *paths, struct path *prealloc)
 {
-	for (struct path *p = paths; p->mnt; p++)
+	for (const struct path *p = paths; p->mnt; p++)
 		path_put(p);
 	if (paths != prealloc)
 		kfree(paths);
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 5f9c053b0897..c09032463b36 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -105,7 +105,7 @@ extern int may_umount(struct vfsmount *);
 int do_mount(const char *, const char __user *,
 		     const char *, unsigned long, void *);
 extern struct path *collect_paths(const struct path *, struct path *, unsigned);
-extern void drop_collected_paths(struct path *, struct path *);
+extern void drop_collected_paths(const struct path *, struct path *);
 extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num);
 
 extern int cifs_root_data(char **dev, char **opts);
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index b0eae2a3c895..32007edf0e55 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -678,7 +678,7 @@ void audit_trim_trees(void)
 		struct audit_tree *tree;
 		struct path path;
 		struct audit_node *node;
-		struct path *paths;
+		const struct path *paths;
 		struct path array[16];
 		int err;
 
@@ -701,7 +701,7 @@ void audit_trim_trees(void)
 			struct audit_chunk *chunk = find_chunk(node);
 			/* this could be NULL if the watch is dying else where... */
 			node->index |= 1U<<31;
-			for (struct path *p = paths; p->dentry; p++) {
+			for (const struct path *p = paths; p->dentry; p++) {
 				struct inode *inode = p->dentry->d_inode;
 				if (inode_to_key(inode) == chunk->key) {
 					node->index &= ~(1U<<31);
@@ -740,9 +740,9 @@ void audit_put_tree(struct audit_tree *tree)
 	put_tree(tree);
 }
 
-static int tag_mounts(struct path *paths, struct audit_tree *tree)
+static int tag_mounts(const struct path *paths, struct audit_tree *tree)
 {
-	for (struct path *p = paths; p->dentry; p++) {
+	for (const struct path *p = paths; p->dentry; p++) {
 		int err = tag_chunk(p->dentry->d_inode, tree);
 		if (err)
 			return err;
@@ -805,7 +805,7 @@ int audit_add_tree_rule(struct audit_krule *rule)
 	struct audit_tree *seed = rule->tree, *tree;
 	struct path path;
 	struct path array[16];
-	struct path *paths;
+	const struct path *paths;
 	int err;
 
 	rule->tree = NULL;
@@ -877,7 +877,7 @@ int audit_tag_tree(char *old, char *new)
 	int failed = 0;
 	struct path path1, path2;
 	struct path array[16];
-	struct path *paths;
+	const struct path *paths;
 	int err;
 
 	err = kern_path(new, 0, &path2);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 40/65] collect_paths(): constify the return value
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (38 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 39/65] drop_collected_paths(): constify arguments Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 41/65] do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s) Al Viro
                         ` (34 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

callers have no business modifying the paths they get

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c        | 4 ++--
 include/linux/mount.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 704eff14735d..759bfd24d1a0 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2300,7 +2300,7 @@ static inline bool extend_array(struct path **res, struct path **to_free,
 	return p;
 }
 
-struct path *collect_paths(const struct path *path,
+const struct path *collect_paths(const struct path *path,
 			      struct path *prealloc, unsigned count)
 {
 	struct mount *root = real_mount(path->mnt);
@@ -2334,7 +2334,7 @@ struct path *collect_paths(const struct path *path,
 	return res;
 }
 
-void drop_collected_paths(const struct path *paths, struct path *prealloc)
+void drop_collected_paths(const struct path *paths, const struct path *prealloc)
 {
 	for (const struct path *p = paths; p->mnt; p++)
 		path_put(p);
diff --git a/include/linux/mount.h b/include/linux/mount.h
index c09032463b36..18e4b97f8a98 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -104,8 +104,8 @@ extern int may_umount_tree(struct vfsmount *);
 extern int may_umount(struct vfsmount *);
 int do_mount(const char *, const char __user *,
 		     const char *, unsigned long, void *);
-extern struct path *collect_paths(const struct path *, struct path *, unsigned);
-extern void drop_collected_paths(const struct path *, struct path *);
+extern const struct path *collect_paths(const struct path *, struct path *, unsigned);
+extern void drop_collected_paths(const struct path *, const struct path *);
 extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num);
 
 extern int cifs_root_data(char **dev, char **opts);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 41/65] do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s)
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (39 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 40/65] collect_paths(): constify the return value Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 42/65] mnt_warn_timestamp_expiry(): constify struct path argument Al Viro
                         ` (33 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 759bfd24d1a0..dcaf50e920af 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3572,8 +3572,9 @@ static inline bool may_use_mount(struct mount *mnt)
 	return check_anonymous_mnt(mnt);
 }
 
-static int do_move_mount(struct path *old_path,
-			 struct path *new_path, enum mnt_tree_flags_t flags)
+static int do_move_mount(const struct path *old_path,
+			 const struct path *new_path,
+			 enum mnt_tree_flags_t flags)
 {
 	struct mount *old = real_mount(old_path->mnt);
 	int err;
@@ -3645,7 +3646,7 @@ static int do_move_mount(struct path *old_path,
 	return attach_recursive_mnt(old, &mp);
 }
 
-static int do_move_mount_old(struct path *path, const char *old_name)
+static int do_move_mount_old(const struct path *path, const char *old_name)
 {
 	struct path old_path;
 	int err;
@@ -4475,7 +4476,8 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
 	return ret;
 }
 
-static inline int vfs_move_mount(struct path *from_path, struct path *to_path,
+static inline int vfs_move_mount(const struct path *from_path,
+				 const struct path *to_path,
 				 enum mnt_tree_flags_t mflags)
 {
 	int ret;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 42/65] mnt_warn_timestamp_expiry(): constify struct path argument
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (40 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 41/65] do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s) Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 43/65] do_new_mount{,_fc}(): " Al Viro
                         ` (32 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index dcaf50e920af..be3aecc5a9c0 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3230,7 +3230,8 @@ static void set_mount_attributes(struct mount *mnt, unsigned int mnt_flags)
 	touch_mnt_namespace(mnt->mnt_ns);
 }
 
-static void mnt_warn_timestamp_expiry(struct path *mountpoint, struct vfsmount *mnt)
+static void mnt_warn_timestamp_expiry(const struct path *mountpoint,
+				      struct vfsmount *mnt)
 {
 	struct super_block *sb = mnt->mnt_sb;
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 43/65] do_new_mount{,_fc}(): constify struct path argument
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (41 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 42/65] mnt_warn_timestamp_expiry(): constify struct path argument Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 44/65] do_{loopback,change_type,remount,reconfigure_mnt}(): " Al Viro
                         ` (31 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index be3aecc5a9c0..f3f26125444d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3704,7 +3704,7 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
  * Create a new mount using a superblock configuration and request it
  * be added to the namespace tree.
  */
-static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
+static int do_new_mount_fc(struct fs_context *fc, const struct path *mountpoint,
 			   unsigned int mnt_flags)
 {
 	struct super_block *sb;
@@ -3735,8 +3735,9 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint,
  * create a new mount for userspace and request it to be added into the
  * namespace's tree
  */
-static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
-			int mnt_flags, const char *name, void *data)
+static int do_new_mount(const struct path *path, const char *fstype,
+			int sb_flags, int mnt_flags,
+			const char *name, void *data)
 {
 	struct file_system_type *type;
 	struct fs_context *fc;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 44/65] do_{loopback,change_type,remount,reconfigure_mnt}(): constify struct path argument
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (42 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 43/65] do_new_mount{,_fc}(): " Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 45/65] path_mount(): " Al Viro
                         ` (30 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index f3f26125444d..894631bcbdbd 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2914,7 +2914,7 @@ static int flags_to_propagation_type(int ms_flags)
 /*
  * recursively change the type of the mountpoint.
  */
-static int do_change_type(struct path *path, int ms_flags)
+static int do_change_type(const struct path *path, int ms_flags)
 {
 	struct mount *m;
 	struct mount *mnt = real_mount(path->mnt);
@@ -3034,8 +3034,8 @@ static struct mount *__do_loopback(struct path *old_path, int recurse)
 /*
  * do loopback mount.
  */
-static int do_loopback(struct path *path, const char *old_name,
-				int recurse)
+static int do_loopback(const struct path *path, const char *old_name,
+		       int recurse)
 {
 	struct path old_path __free(path_put) = {};
 	struct mount *mnt = NULL;
@@ -3265,7 +3265,7 @@ static void mnt_warn_timestamp_expiry(const struct path *mountpoint,
  * superblock it refers to.  This is triggered by specifying MS_REMOUNT|MS_BIND
  * to mount(2).
  */
-static int do_reconfigure_mnt(struct path *path, unsigned int mnt_flags)
+static int do_reconfigure_mnt(const struct path *path, unsigned int mnt_flags)
 {
 	struct super_block *sb = path->mnt->mnt_sb;
 	struct mount *mnt = real_mount(path->mnt);
@@ -3302,7 +3302,7 @@ static int do_reconfigure_mnt(struct path *path, unsigned int mnt_flags)
  * If you've mounted a non-root directory somewhere and want to do remount
  * on it - tough luck.
  */
-static int do_remount(struct path *path, int ms_flags, int sb_flags,
+static int do_remount(const struct path *path, int ms_flags, int sb_flags,
 		      int mnt_flags, void *data)
 {
 	int err;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 45/65] path_mount(): constify struct path argument
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (43 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 44/65] do_{loopback,change_type,remount,reconfigure_mnt}(): " Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 46/65] may_copy_tree(), __do_loopback(): " Al Viro
                         ` (29 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

now it finally can be done.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/internal.h  | 2 +-
 fs/namespace.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 38e8aab27bbd..fe88563b4822 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -84,7 +84,7 @@ void mnt_put_write_access_file(struct file *file);
 extern void dissolve_on_fput(struct vfsmount *);
 extern bool may_mount(void);
 
-int path_mount(const char *dev_name, struct path *path,
+int path_mount(const char *dev_name, const struct path *path,
 		const char *type_page, unsigned long flags, void *data_page);
 int path_umount(struct path *path, int flags);
 
diff --git a/fs/namespace.c b/fs/namespace.c
index 894631bcbdbd..3a9db3e84a92 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4018,7 +4018,7 @@ static char *copy_mount_string(const void __user *data)
  * Therefore, if this magic number is present, it carries no information
  * and must be discarded.
  */
-int path_mount(const char *dev_name, struct path *path,
+int path_mount(const char *dev_name, const struct path *path,
 		const char *type_page, unsigned long flags, void *data_page)
 {
 	unsigned int mnt_flags = 0, sb_flags;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 46/65] may_copy_tree(), __do_loopback(): constify struct path argument
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (44 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 45/65] path_mount(): " Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 47/65] path_umount(): " Al Viro
                         ` (28 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 3a9db3e84a92..4ed3d16534bb 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2990,7 +2990,7 @@ static int do_change_type(const struct path *path, int ms_flags)
  *
  * Returns true if the mount tree can be copied, false otherwise.
  */
-static inline bool may_copy_tree(struct path *path)
+static inline bool may_copy_tree(const struct path *path)
 {
 	struct mount *mnt = real_mount(path->mnt);
 	const struct dentry_operations *d_op;
@@ -3012,7 +3012,7 @@ static inline bool may_copy_tree(struct path *path)
 }
 
 
-static struct mount *__do_loopback(struct path *old_path, int recurse)
+static struct mount *__do_loopback(const struct path *old_path, int recurse)
 {
 	struct mount *old = real_mount(old_path->mnt);
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 47/65] path_umount(): constify struct path argument
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (45 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 46/65] may_copy_tree(), __do_loopback(): " Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 48/65] constify can_move_mount_beneath() arguments Al Viro
                         ` (27 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/internal.h  | 2 +-
 fs/namespace.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index fe88563b4822..549e6bd453b0 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -86,7 +86,7 @@ extern bool may_mount(void);
 
 int path_mount(const char *dev_name, const struct path *path,
 		const char *type_page, unsigned long flags, void *data_page);
-int path_umount(struct path *path, int flags);
+int path_umount(const struct path *path, int flags);
 
 int show_path(struct seq_file *m, struct dentry *root);
 
diff --git a/fs/namespace.c b/fs/namespace.c
index 4ed3d16534bb..20c409852f6d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2084,7 +2084,7 @@ static int can_umount(const struct path *path, int flags)
 }
 
 // caller is responsible for flags being sane
-int path_umount(struct path *path, int flags)
+int path_umount(const struct path *path, int flags)
 {
 	struct mount *mnt = real_mount(path->mnt);
 	int ret;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 48/65] constify can_move_mount_beneath() arguments
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (46 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 47/65] path_umount(): " Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 49/65] do_move_mount_old(): use __free(path_put) Al Viro
                         ` (26 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 20c409852f6d..18229a6e045d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3472,8 +3472,8 @@ static bool mount_is_ancestor(const struct mount *p1, const struct mount *p2)
  * Context: This function expects namespace_lock() to be held.
  * Return: On success 0, and on error a negative error code is returned.
  */
-static int can_move_mount_beneath(struct mount *mnt_from,
-				  struct mount *mnt_to,
+static int can_move_mount_beneath(const struct mount *mnt_from,
+				  const struct mount *mnt_to,
 				  const struct mountpoint *mp)
 {
 	struct mount *parent_mnt_to = mnt_to->mnt_parent;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 49/65] do_move_mount_old(): use __free(path_put)
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (47 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 48/65] constify can_move_mount_beneath() arguments Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 50/65] do_mount(): " Al Viro
                         ` (25 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 18229a6e045d..5372b71a8d7a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3649,7 +3649,7 @@ static int do_move_mount(const struct path *old_path,
 
 static int do_move_mount_old(const struct path *path, const char *old_name)
 {
-	struct path old_path;
+	struct path old_path __free(path_put) = {};
 	int err;
 
 	if (!old_name || !*old_name)
@@ -3659,9 +3659,7 @@ static int do_move_mount_old(const struct path *path, const char *old_name)
 	if (err)
 		return err;
 
-	err = do_move_mount(&old_path, path, 0);
-	path_put(&old_path);
-	return err;
+	return do_move_mount(&old_path, path, 0);
 }
 
 /*
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 50/65] do_mount(): use __free(path_put)
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (48 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 49/65] do_move_mount_old(): use __free(path_put) Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 51/65] umount_tree(): take all victims out of propagation graph at once Al Viro
                         ` (24 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5372b71a8d7a..f977438b4d6e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4098,15 +4098,13 @@ int path_mount(const char *dev_name, const struct path *path,
 int do_mount(const char *dev_name, const char __user *dir_name,
 		const char *type_page, unsigned long flags, void *data_page)
 {
-	struct path path;
+	struct path path __free(path_put) = {};
 	int ret;
 
 	ret = user_path_at(AT_FDCWD, dir_name, LOOKUP_FOLLOW, &path);
 	if (ret)
 		return ret;
-	ret = path_mount(dev_name, &path, type_page, flags, data_page);
-	path_put(&path);
-	return ret;
+	return path_mount(dev_name, &path, type_page, flags, data_page);
 }
 
 static struct ucounts *inc_mnt_namespaces(struct user_namespace *ns)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 51/65] umount_tree(): take all victims out of propagation graph at once
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (49 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 50/65] do_mount(): " Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 52/65] ecryptfs: get rid of pointless mount references in ecryptfs dentries Al Viro
                         ` (23 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

For each removed mount we need to calculate where the slaves will end up.
To avoid duplicating that work, do it for all mounts to be removed
at once, taking the mounts themselves out of propagation graph as
we go, then do all transfers; the duplicate work on finding destinations
is avoided since if we run into a mount that already had destination found,
we don't need to trace the rest of the way.  That's guaranteed
O(removed mounts) for finding destinations and removing from propagation
graph and O(surviving mounts that have master removed) for transfers.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c |  3 ++-
 fs/pnode.c     | 67 +++++++++++++++++++++++++++++++++++++++-----------
 fs/pnode.h     |  1 +
 3 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index f977438b4d6e..0900fd7456a9 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1846,6 +1846,8 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
 	if (how & UMOUNT_PROPAGATE)
 		propagate_umount(&tmp_list);
 
+	bulk_make_private(&tmp_list);
+
 	while (!list_empty(&tmp_list)) {
 		struct mnt_namespace *ns;
 		bool disconnect;
@@ -1870,7 +1872,6 @@ static void umount_tree(struct mount *mnt, enum umount_tree_flags how)
 				umount_mnt(p);
 			}
 		}
-		change_mnt_propagation(p, MS_PRIVATE);
 		if (disconnect)
 			hlist_add_head(&p->mnt_umount, &unmounted);
 
diff --git a/fs/pnode.c b/fs/pnode.c
index edaf9d9d0eaf..5d91c3e58d2a 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -71,19 +71,6 @@ static inline bool will_be_unmounted(struct mount *m)
 	return m->mnt.mnt_flags & MNT_UMOUNT;
 }
 
-static struct mount *propagation_source(struct mount *mnt)
-{
-	do {
-		struct mount *m;
-		for (m = next_peer(mnt); m != mnt; m = next_peer(m)) {
-			if (!will_be_unmounted(m))
-				return m;
-		}
-		mnt = mnt->mnt_master;
-	} while (mnt && will_be_unmounted(mnt));
-	return mnt;
-}
-
 static void transfer_propagation(struct mount *mnt, struct mount *to)
 {
 	struct hlist_node *p = NULL, *n;
@@ -112,11 +99,10 @@ void change_mnt_propagation(struct mount *mnt, int type)
 		return;
 	}
 	if (IS_MNT_SHARED(mnt)) {
-		if (type == MS_SLAVE || !hlist_empty(&mnt->mnt_slave_list))
-			m = propagation_source(mnt);
 		if (list_empty(&mnt->mnt_share)) {
 			mnt_release_group_id(mnt);
 		} else {
+			m = next_peer(mnt);
 			list_del_init(&mnt->mnt_share);
 			mnt->mnt_group_id = 0;
 		}
@@ -137,6 +123,57 @@ void change_mnt_propagation(struct mount *mnt, int type)
 	}
 }
 
+static struct mount *trace_transfers(struct mount *m)
+{
+	while (1) {
+		struct mount *next = next_peer(m);
+
+		if (next != m) {
+			list_del_init(&m->mnt_share);
+			m->mnt_group_id = 0;
+			m->mnt_master = next;
+		} else {
+			if (IS_MNT_SHARED(m))
+				mnt_release_group_id(m);
+			next = m->mnt_master;
+		}
+		hlist_del_init(&m->mnt_slave);
+		CLEAR_MNT_SHARED(m);
+		SET_MNT_MARK(m);
+
+		if (!next || !will_be_unmounted(next))
+			return next;
+		if (IS_MNT_MARKED(next))
+			return next->mnt_master;
+		m = next;
+	}
+}
+
+static void set_destinations(struct mount *m, struct mount *master)
+{
+	struct mount *next;
+
+	while ((next = m->mnt_master) != master) {
+		m->mnt_master = master;
+		m = next;
+	}
+}
+
+void bulk_make_private(struct list_head *set)
+{
+	struct mount *m;
+
+	list_for_each_entry(m, set, mnt_list)
+		if (!IS_MNT_MARKED(m))
+			set_destinations(m, trace_transfers(m));
+
+	list_for_each_entry(m, set, mnt_list) {
+		transfer_propagation(m, m->mnt_master);
+		m->mnt_master = NULL;
+		CLEAR_MNT_MARK(m);
+	}
+}
+
 static struct mount *__propagation_next(struct mount *m,
 					 struct mount *origin)
 {
diff --git a/fs/pnode.h b/fs/pnode.h
index 00ab153e3e9d..b029db225f33 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -42,6 +42,7 @@ static inline bool peers(const struct mount *m1, const struct mount *m2)
 }
 
 void change_mnt_propagation(struct mount *, int);
+void bulk_make_private(struct list_head *);
 int propagate_mnt(struct mount *, struct mountpoint *, struct mount *,
 		struct hlist_head *);
 void propagate_umount(struct list_head *);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 52/65] ecryptfs: get rid of pointless mount references in ecryptfs dentries
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (50 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 51/65] umount_tree(): take all victims out of propagation graph at once Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 53/65] fs/namespace.c: sanitize descriptions for {__,}lookup_mnt() Al Viro
                         ` (22 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

->lower_path.mnt has the same value for all dentries on given ecryptfs
instance and if somebody goes for mountpoint-crossing variant where that
would not be true, we can deal with that when it happens (and _not_
with duplicating these reference into each dentry).

As it is, we are better off just sticking a reference into ecryptfs-private
part of superblock and keeping it pinned until ->kill_sb().

That way we can stick a reference to underlying dentry right into ->d_fsdata
of ecryptfs one, getting rid of indirection through struct ecryptfs_dentry_info,
along with the entire struct ecryptfs_dentry_info machinery.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ecryptfs/dentry.c          | 14 +-------------
 fs/ecryptfs/ecryptfs_kernel.h | 27 +++++++++++----------------
 fs/ecryptfs/file.c            | 15 +++++++--------
 fs/ecryptfs/inode.c           | 19 +++++--------------
 fs/ecryptfs/main.c            | 24 ++++++------------------
 5 files changed, 30 insertions(+), 69 deletions(-)

diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c
index 1dfd5b81d831..6648a924e31a 100644
--- a/fs/ecryptfs/dentry.c
+++ b/fs/ecryptfs/dentry.c
@@ -59,14 +59,6 @@ static int ecryptfs_d_revalidate(struct inode *dir, const struct qstr *name,
 	return rc;
 }
 
-struct kmem_cache *ecryptfs_dentry_info_cache;
-
-static void ecryptfs_dentry_free_rcu(struct rcu_head *head)
-{
-	kmem_cache_free(ecryptfs_dentry_info_cache,
-		container_of(head, struct ecryptfs_dentry_info, rcu));
-}
-
 /**
  * ecryptfs_d_release
  * @dentry: The ecryptfs dentry
@@ -75,11 +67,7 @@ static void ecryptfs_dentry_free_rcu(struct rcu_head *head)
  */
 static void ecryptfs_d_release(struct dentry *dentry)
 {
-	struct ecryptfs_dentry_info *p = dentry->d_fsdata;
-	if (p) {
-		path_put(&p->lower_path);
-		call_rcu(&p->rcu, ecryptfs_dentry_free_rcu);
-	}
+	dput(dentry->d_fsdata);
 }
 
 const struct dentry_operations ecryptfs_dops = {
diff --git a/fs/ecryptfs/ecryptfs_kernel.h b/fs/ecryptfs/ecryptfs_kernel.h
index 1f562e75d0e4..9e6ab0b41337 100644
--- a/fs/ecryptfs/ecryptfs_kernel.h
+++ b/fs/ecryptfs/ecryptfs_kernel.h
@@ -258,13 +258,6 @@ struct ecryptfs_inode_info {
 	struct ecryptfs_crypt_stat crypt_stat;
 };
 
-/* dentry private data. Each dentry must keep track of a lower
- * vfsmount too. */
-struct ecryptfs_dentry_info {
-	struct path lower_path;
-	struct rcu_head rcu;
-};
-
 /**
  * ecryptfs_global_auth_tok - A key used to encrypt all new files under the mountpoint
  * @flags: Status flags
@@ -348,6 +341,7 @@ struct ecryptfs_mount_crypt_stat {
 /* superblock private data. */
 struct ecryptfs_sb_info {
 	struct super_block *wsi_sb;
+	struct vfsmount *lower_mnt;
 	struct ecryptfs_mount_crypt_stat mount_crypt_stat;
 };
 
@@ -494,22 +488,25 @@ ecryptfs_set_superblock_lower(struct super_block *sb,
 }
 
 static inline void
-ecryptfs_set_dentry_private(struct dentry *dentry,
-			    struct ecryptfs_dentry_info *dentry_info)
+ecryptfs_set_dentry_lower(struct dentry *dentry,
+			  struct dentry *lower_dentry)
 {
-	dentry->d_fsdata = dentry_info;
+	dentry->d_fsdata = lower_dentry;
 }
 
 static inline struct dentry *
 ecryptfs_dentry_to_lower(struct dentry *dentry)
 {
-	return ((struct ecryptfs_dentry_info *)dentry->d_fsdata)->lower_path.dentry;
+	return dentry->d_fsdata;
 }
 
-static inline const struct path *
-ecryptfs_dentry_to_lower_path(struct dentry *dentry)
+static inline struct path
+ecryptfs_lower_path(struct dentry *dentry)
 {
-	return &((struct ecryptfs_dentry_info *)dentry->d_fsdata)->lower_path;
+	return (struct path){
+		.mnt = ecryptfs_superblock_to_private(dentry->d_sb)->lower_mnt,
+		.dentry = ecryptfs_dentry_to_lower(dentry)
+	};
 }
 
 #define ecryptfs_printk(type, fmt, arg...) \
@@ -532,7 +529,6 @@ extern unsigned int ecryptfs_number_of_users;
 
 extern struct kmem_cache *ecryptfs_auth_tok_list_item_cache;
 extern struct kmem_cache *ecryptfs_file_info_cache;
-extern struct kmem_cache *ecryptfs_dentry_info_cache;
 extern struct kmem_cache *ecryptfs_inode_info_cache;
 extern struct kmem_cache *ecryptfs_sb_info_cache;
 extern struct kmem_cache *ecryptfs_header_cache;
@@ -557,7 +553,6 @@ int ecryptfs_encrypt_and_encode_filename(
 	size_t *encoded_name_size,
 	struct ecryptfs_mount_crypt_stat *mount_crypt_stat,
 	const char *name, size_t name_size);
-struct dentry *ecryptfs_lower_dentry(struct dentry *this_dentry);
 void ecryptfs_dump_hex(char *data, int bytes);
 int virt_to_scatterlist(const void *addr, int size, struct scatterlist *sg,
 			int sg_size);
diff --git a/fs/ecryptfs/file.c b/fs/ecryptfs/file.c
index 5f8f96da09fe..7929411837cf 100644
--- a/fs/ecryptfs/file.c
+++ b/fs/ecryptfs/file.c
@@ -33,13 +33,12 @@ static ssize_t ecryptfs_read_update_atime(struct kiocb *iocb,
 				struct iov_iter *to)
 {
 	ssize_t rc;
-	const struct path *path;
 	struct file *file = iocb->ki_filp;
 
 	rc = generic_file_read_iter(iocb, to);
 	if (rc >= 0) {
-		path = ecryptfs_dentry_to_lower_path(file->f_path.dentry);
-		touch_atime(path);
+		struct path path = ecryptfs_lower_path(file->f_path.dentry);
+		touch_atime(&path);
 	}
 	return rc;
 }
@@ -59,12 +58,11 @@ static ssize_t ecryptfs_splice_read_update_atime(struct file *in, loff_t *ppos,
 						 size_t len, unsigned int flags)
 {
 	ssize_t rc;
-	const struct path *path;
 
 	rc = filemap_splice_read(in, ppos, pipe, len, flags);
 	if (rc >= 0) {
-		path = ecryptfs_dentry_to_lower_path(in->f_path.dentry);
-		touch_atime(path);
+		struct path path = ecryptfs_lower_path(in->f_path.dentry);
+		touch_atime(&path);
 	}
 	return rc;
 }
@@ -283,6 +281,7 @@ static int ecryptfs_dir_open(struct inode *inode, struct file *file)
 	 * ecryptfs_lookup() */
 	struct ecryptfs_file_info *file_info;
 	struct file *lower_file;
+	struct path path;
 
 	/* Released in ecryptfs_release or end of function if failure */
 	file_info = kmem_cache_zalloc(ecryptfs_file_info_cache, GFP_KERNEL);
@@ -292,8 +291,8 @@ static int ecryptfs_dir_open(struct inode *inode, struct file *file)
 				"Error attempting to allocate memory\n");
 		return -ENOMEM;
 	}
-	lower_file = dentry_open(ecryptfs_dentry_to_lower_path(ecryptfs_dentry),
-				 file->f_flags, current_cred());
+	path = ecryptfs_lower_path(ecryptfs_dentry);
+	lower_file = dentry_open(&path, file->f_flags, current_cred());
 	if (IS_ERR(lower_file)) {
 		printk(KERN_ERR "%s: Error attempting to initialize "
 			"the lower file for the dentry with name "
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 72fbe1316ab8..d2b262dc485d 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -327,24 +327,15 @@ static int ecryptfs_i_size_read(struct dentry *dentry, struct inode *inode)
 static struct dentry *ecryptfs_lookup_interpose(struct dentry *dentry,
 				     struct dentry *lower_dentry)
 {
-	const struct path *path = ecryptfs_dentry_to_lower_path(dentry->d_parent);
+	struct dentry *lower_parent = ecryptfs_dentry_to_lower(dentry->d_parent);
 	struct inode *inode, *lower_inode;
-	struct ecryptfs_dentry_info *dentry_info;
 	int rc = 0;
 
-	dentry_info = kmem_cache_alloc(ecryptfs_dentry_info_cache, GFP_KERNEL);
-	if (!dentry_info) {
-		dput(lower_dentry);
-		return ERR_PTR(-ENOMEM);
-	}
-
 	fsstack_copy_attr_atime(d_inode(dentry->d_parent),
-				d_inode(path->dentry));
+				d_inode(lower_parent));
 	BUG_ON(!d_count(lower_dentry));
 
-	ecryptfs_set_dentry_private(dentry, dentry_info);
-	dentry_info->lower_path.mnt = mntget(path->mnt);
-	dentry_info->lower_path.dentry = lower_dentry;
+	ecryptfs_set_dentry_lower(dentry, lower_dentry);
 
 	/*
 	 * negative dentry can go positive under us here - its parent is not
@@ -1022,10 +1013,10 @@ static int ecryptfs_getattr(struct mnt_idmap *idmap,
 {
 	struct dentry *dentry = path->dentry;
 	struct kstat lower_stat;
+	struct path lower_path = ecryptfs_lower_path(dentry);
 	int rc;
 
-	rc = vfs_getattr_nosec(ecryptfs_dentry_to_lower_path(dentry),
-			       &lower_stat, request_mask, flags);
+	rc = vfs_getattr_nosec(&lower_path, &lower_stat, request_mask, flags);
 	if (!rc) {
 		fsstack_copy_attr_all(d_inode(dentry),
 				      ecryptfs_inode_to_lower(d_inode(dentry)));
diff --git a/fs/ecryptfs/main.c b/fs/ecryptfs/main.c
index eab1beb846d3..2afbcbbd9546 100644
--- a/fs/ecryptfs/main.c
+++ b/fs/ecryptfs/main.c
@@ -106,15 +106,14 @@ static int ecryptfs_init_lower_file(struct dentry *dentry,
 				    struct file **lower_file)
 {
 	const struct cred *cred = current_cred();
-	const struct path *path = ecryptfs_dentry_to_lower_path(dentry);
+	struct path path = ecryptfs_lower_path(dentry);
 	int rc;
 
-	rc = ecryptfs_privileged_open(lower_file, path->dentry, path->mnt,
-				      cred);
+	rc = ecryptfs_privileged_open(lower_file, path.dentry, path.mnt, cred);
 	if (rc) {
 		printk(KERN_ERR "Error opening lower file "
 		       "for lower_dentry [0x%p] and lower_mnt [0x%p]; "
-		       "rc = [%d]\n", path->dentry, path->mnt, rc);
+		       "rc = [%d]\n", path.dentry, path.mnt, rc);
 		(*lower_file) = NULL;
 	}
 	return rc;
@@ -437,7 +436,6 @@ static int ecryptfs_get_tree(struct fs_context *fc)
 	struct ecryptfs_fs_context *ctx = fc->fs_private;
 	struct ecryptfs_sb_info *sbi = fc->s_fs_info;
 	struct ecryptfs_mount_crypt_stat *mount_crypt_stat;
-	struct ecryptfs_dentry_info *root_info;
 	const char *err = "Getting sb failed";
 	struct inode *inode;
 	struct path path;
@@ -543,14 +541,8 @@ static int ecryptfs_get_tree(struct fs_context *fc)
 		goto out_free;
 	}
 
-	rc = -ENOMEM;
-	root_info = kmem_cache_zalloc(ecryptfs_dentry_info_cache, GFP_KERNEL);
-	if (!root_info)
-		goto out_free;
-
-	/* ->kill_sb() will take care of root_info */
-	ecryptfs_set_dentry_private(s->s_root, root_info);
-	root_info->lower_path = path;
+	ecryptfs_set_dentry_lower(s->s_root, path.dentry);
+	sbi->lower_mnt = path.mnt;
 
 	s->s_flags |= SB_ACTIVE;
 	fc->root = dget(s->s_root);
@@ -580,6 +572,7 @@ static void ecryptfs_kill_block_super(struct super_block *sb)
 	kill_anon_super(sb);
 	if (!sb_info)
 		return;
+	mntput(sb_info->lower_mnt);
 	ecryptfs_destroy_mount_crypt_stat(&sb_info->mount_crypt_stat);
 	kmem_cache_free(ecryptfs_sb_info_cache, sb_info);
 }
@@ -667,11 +660,6 @@ static struct ecryptfs_cache_info {
 		.name = "ecryptfs_file_cache",
 		.size = sizeof(struct ecryptfs_file_info),
 	},
-	{
-		.cache = &ecryptfs_dentry_info_cache,
-		.name = "ecryptfs_dentry_info_cache",
-		.size = sizeof(struct ecryptfs_dentry_info),
-	},
 	{
 		.cache = &ecryptfs_inode_info_cache,
 		.name = "ecryptfs_inode_cache",
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 53/65] fs/namespace.c: sanitize descriptions for {__,}lookup_mnt()
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (51 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 52/65] ecryptfs: get rid of pointless mount references in ecryptfs dentries Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 54/63] open_detached_copy(): don't bother with mount_lock_hash() Al Viro
                         ` (21 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Comments regarding "shadow mounts" were stale - no such thing anymore.
Document the locking requirements for __lookup_mnt().

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 41 ++++++++++++-----------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 0900fd7456a9..a195e25a5d61 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -825,24 +825,16 @@ static bool legitimize_mnt(struct vfsmount *bastard, unsigned seq)
 }
 
 /**
- * __lookup_mnt - find first child mount
+ * __lookup_mnt - mount hash lookup
  * @mnt:	parent mount
- * @dentry:	mountpoint
+ * @dentry:	dentry of mountpoint
  *
- * If @mnt has a child mount @c mounted @dentry find and return it.
+ * If @mnt has a child mount @c mounted on @dentry find and return it.
+ * Caller must either hold the spinlock component of @mount_lock or
+ * hold rcu_read_lock(), sample the seqcount component before the call
+ * and recheck it afterwards.
  *
- * Note that the child mount @c need not be unique. There are cases
- * where shadow mounts are created. For example, during mount
- * propagation when a source mount @mnt whose root got overmounted by a
- * mount @o after path lookup but before @namespace_sem could be
- * acquired gets copied and propagated. So @mnt gets copied including
- * @o. When @mnt is propagated to a destination mount @d that already
- * has another mount @n mounted at the same mountpoint then the source
- * mount @mnt will be tucked beneath @n, i.e., @n will be mounted on
- * @mnt and @mnt mounted on @d. Now both @n and @o are mounted at @mnt
- * on @dentry.
- *
- * Return: The first child of @mnt mounted @dentry or NULL.
+ * Return: The child of @mnt mounted on @dentry or %NULL.
  */
 struct mount *__lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
 {
@@ -855,21 +847,12 @@ struct mount *__lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
 	return NULL;
 }
 
-/*
- * lookup_mnt - Return the first child mount mounted at path
- *
- * "First" means first mounted chronologically.  If you create the
- * following mounts:
- *
- * mount /dev/sda1 /mnt
- * mount /dev/sda2 /mnt
- * mount /dev/sda3 /mnt
- *
- * Then lookup_mnt() on the base /mnt dentry in the root mount will
- * return successively the root dentry and vfsmount of /dev/sda1, then
- * /dev/sda2, then /dev/sda3, then NULL.
+/**
+ * lookup_mnt - Return the child mount mounted at given location
+ * @path:	location in the namespace
  *
- * lookup_mnt takes a reference to the found vfsmount.
+ * Acquires and returns a new reference to mount at given location
+ * or %NULL if nothing is mounted there.
  */
 struct vfsmount *lookup_mnt(const struct path *path)
 {
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 54/63] open_detached_copy(): don't bother with mount_lock_hash()
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (52 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 53/65] fs/namespace.c: sanitize descriptions for {__,}lookup_mnt() Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 54/65] path_has_submounts(): use guard(mount_locked_reader) Al Viro
                         ` (20 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

we are holding namespace_sem and a reference to root of tree;
iterating through that tree does not need mount_lock.  Neither
does the insertion into the rbtree of new namespace or incrementing
the mount count of that namespace.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2e35f5eb4f81..425c33377770 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3086,14 +3086,12 @@ static struct file *open_detached_copy(struct path *path, bool recursive)
 		return ERR_CAST(mnt);
 	}
 
-	lock_mount_hash();
 	for (p = mnt; p; p = next_mnt(p, mnt)) {
 		mnt_add_to_ns(ns, p);
 		ns->nr_mounts++;
 	}
 	ns->root = mnt;
 	mntget(&mnt->mnt);
-	unlock_mount_hash();
 	namespace_unlock();
 
 	mntput(path->mnt);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 54/65] path_has_submounts(): use guard(mount_locked_reader)
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (53 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 54/63] open_detached_copy(): don't bother with mount_lock_hash() Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 55/65] open_detached_copy(): don't bother with mount_lock_hash() Al Viro
                         ` (19 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Needed there since the callback passed to d_walk() (path_check_mount())
is using __path_is_mountpoint(), which uses __lookup_mnt().

Has to be taken in the caller - d_walk() might take rename_lock spinlock
component and that nests inside mount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/dcache.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 60046ae23d51..ab21a8402db0 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1390,6 +1390,7 @@ struct check_mount {
 	unsigned int mounted;
 };
 
+/* locks: mount_locked_reader && dentry->d_lock */
 static enum d_walk_ret path_check_mount(void *data, struct dentry *dentry)
 {
 	struct check_mount *info = data;
@@ -1416,9 +1417,8 @@ int path_has_submounts(const struct path *parent)
 {
 	struct check_mount data = { .mnt = parent->mnt, .mounted = 0 };
 
-	read_seqlock_excl(&mount_lock);
+	guard(mount_locked_reader)();
 	d_walk(parent->dentry, &data, path_check_mount);
-	read_sequnlock_excl(&mount_lock);
 
 	return data.mounted;
 }
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 55/65] open_detached_copy(): don't bother with mount_lock_hash()
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (54 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 54/65] path_has_submounts(): use guard(mount_locked_reader) Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 55/63] open_detached_copy(): separate creation of namespace into helper Al Viro
                         ` (18 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

we are holding namespace_sem and a reference to root of tree;
iterating through that tree does not need mount_lock.  Neither
does the insertion into the rbtree of new namespace or incrementing
the mount count of that namespace.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a195e25a5d61..69ef608b8c3a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3086,14 +3086,12 @@ static struct file *open_detached_copy(struct path *path, bool recursive)
 		return ERR_CAST(mnt);
 	}
 
-	lock_mount_hash();
 	for (p = mnt; p; p = next_mnt(p, mnt)) {
 		mnt_add_to_ns(ns, p);
 		ns->nr_mounts++;
 	}
 	ns->root = mnt;
 	mntget(&mnt->mnt);
-	unlock_mount_hash();
 	namespace_unlock();
 
 	mntput(path->mnt);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 55/63] open_detached_copy(): separate creation of namespace into helper
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (55 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 55/65] open_detached_copy(): don't bother with mount_lock_hash() Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 56/63] mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list Al Viro
                         ` (17 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... and convert the helper to use of a guard(namespace_excl)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 425c33377770..c324800e770c 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3053,18 +3053,17 @@ static int do_loopback(const struct path *path, const char *old_name,
 	return err;
 }
 
-static struct file *open_detached_copy(struct path *path, bool recursive)
+static struct mnt_namespace *get_detached_copy(const struct path *path, bool recursive)
 {
 	struct mnt_namespace *ns, *mnt_ns = current->nsproxy->mnt_ns, *src_mnt_ns;
 	struct user_namespace *user_ns = mnt_ns->user_ns;
 	struct mount *mnt, *p;
-	struct file *file;
 
 	ns = alloc_mnt_ns(user_ns, true);
 	if (IS_ERR(ns))
-		return ERR_CAST(ns);
+		return ns;
 
-	namespace_lock();
+	guard(namespace_excl)();
 
 	/*
 	 * Record the sequence number of the source mount namespace.
@@ -3081,8 +3080,7 @@ static struct file *open_detached_copy(struct path *path, bool recursive)
 
 	mnt = __do_loopback(path, recursive);
 	if (IS_ERR(mnt)) {
-		namespace_unlock();
-		free_mnt_ns(ns);
+		emptied_ns = ns;
 		return ERR_CAST(mnt);
 	}
 
@@ -3091,11 +3089,19 @@ static struct file *open_detached_copy(struct path *path, bool recursive)
 		ns->nr_mounts++;
 	}
 	ns->root = mnt;
-	mntget(&mnt->mnt);
-	namespace_unlock();
+	return ns;
+}
+
+static struct file *open_detached_copy(struct path *path, bool recursive)
+{
+	struct mnt_namespace *ns = get_detached_copy(path, recursive);
+	struct file *file;
+
+	if (IS_ERR(ns))
+		return ERR_CAST(ns);
 
 	mntput(path->mnt);
-	path->mnt = &mnt->mnt;
+	path->mnt = mntget(&ns->root->mnt);
 	file = dentry_open(path, O_PATH, current_cred());
 	if (IS_ERR(file))
 		dissolve_on_fput(path->mnt);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 56/63] mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (56 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 55/63] open_detached_copy(): separate creation of namespace into helper Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 56/65] open_detached_copy(): separate creation of namespace into helper Al Viro
                         ` (16 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Actual removal is done under the lock, but for checking if need to bother
the lockless list_empty() is safe - either that namespace never had never
been added to mnt_ns_tree, in which case the list will stay empty, or
whoever had allocated it has called mnt_ns_tree_add() and it has already
run to completion.  After that point list_empty() will become false and
will remain false, no matter what we do with the neighbors in mnt_ns_list.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index c324800e770c..daa72292ea58 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -195,7 +195,7 @@ static void mnt_ns_release_rcu(struct rcu_head *rcu)
 static void mnt_ns_tree_remove(struct mnt_namespace *ns)
 {
 	/* remove from global mount namespace list */
-	if (!is_anon_ns(ns)) {
+	if (!list_empty(&ns->mnt_ns_list)) {
 		mnt_ns_tree_write_lock();
 		rb_erase(&ns->mnt_ns_tree_node, &mnt_ns_tree);
 		list_bidir_del_rcu(&ns->mnt_ns_list);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 56/65] open_detached_copy(): separate creation of namespace into helper
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (57 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 56/63] mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 57/63] copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure Al Viro
                         ` (15 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... and convert the helper to use of a guard(namespace_excl)

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 69ef608b8c3a..5b802cd33058 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3053,18 +3053,17 @@ static int do_loopback(const struct path *path, const char *old_name,
 	return err;
 }
 
-static struct file *open_detached_copy(struct path *path, bool recursive)
+static struct mnt_namespace *get_detached_copy(const struct path *path, bool recursive)
 {
 	struct mnt_namespace *ns, *mnt_ns = current->nsproxy->mnt_ns, *src_mnt_ns;
 	struct user_namespace *user_ns = mnt_ns->user_ns;
 	struct mount *mnt, *p;
-	struct file *file;
 
 	ns = alloc_mnt_ns(user_ns, true);
 	if (IS_ERR(ns))
-		return ERR_CAST(ns);
+		return ns;
 
-	namespace_lock();
+	guard(namespace_excl)();
 
 	/*
 	 * Record the sequence number of the source mount namespace.
@@ -3081,8 +3080,7 @@ static struct file *open_detached_copy(struct path *path, bool recursive)
 
 	mnt = __do_loopback(path, recursive);
 	if (IS_ERR(mnt)) {
-		namespace_unlock();
-		free_mnt_ns(ns);
+		emptied_ns = ns;
 		return ERR_CAST(mnt);
 	}
 
@@ -3091,11 +3089,19 @@ static struct file *open_detached_copy(struct path *path, bool recursive)
 		ns->nr_mounts++;
 	}
 	ns->root = mnt;
-	mntget(&mnt->mnt);
-	namespace_unlock();
+	return ns;
+}
+
+static struct file *open_detached_copy(struct path *path, bool recursive)
+{
+	struct mnt_namespace *ns = get_detached_copy(path, recursive);
+	struct file *file;
+
+	if (IS_ERR(ns))
+		return ERR_CAST(ns);
 
 	mntput(path->mnt);
-	path->mnt = &mnt->mnt;
+	path->mnt = mntget(&ns->root->mnt);
 	file = dentry_open(path, O_PATH, current_cred());
 	if (IS_ERR(file))
 		dissolve_on_fput(path->mnt);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 57/63] copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (58 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 56/65] open_detached_copy(): separate creation of namespace into helper Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 57/65] mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list Al Viro
                         ` (14 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Now that free_mnt_ns() works prior to mnt_ns_tree_add(), there's no need for
an open-coded analogue free_mnt_ns() there - yes, we do avoid one call_rcu()
use per failing call of clone() or unshare(), if they fail due to OOM in that
particular spot, but it's not really worth bothering.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index daa72292ea58..a418555586ef 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4190,10 +4190,8 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 		copy_flags |= CL_SLAVE;
 	new = copy_tree(old, old->mnt.mnt_root, copy_flags);
 	if (IS_ERR(new)) {
+		emptied_ns = new_ns;
 		namespace_unlock();
-		ns_free_inum(&new_ns->ns);
-		dec_mnt_namespaces(new_ns->ucounts);
-		mnt_ns_release(new_ns);
 		return ERR_CAST(new);
 	}
 	if (user_ns != ns->user_ns) {
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 57/65] mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (59 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 57/63] copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 58/63] copy_mnt_ns(): use guards Al Viro
                         ` (13 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Actual removal is done under the lock, but for checking if need to bother
the lockless list_empty() is safe - either that namespace had never
been added to mnt_ns_tree, in which case the list will stay empty, or
whoever had allocated it has called mnt_ns_tree_add() and it has already
run to completion.  After that point list_empty() will become false and
will remain false, no matter what we do with the neighbors in mnt_ns_list.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 5b802cd33058..c175536cc7b5 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -195,7 +195,7 @@ static void mnt_ns_release_rcu(struct rcu_head *rcu)
 static void mnt_ns_tree_remove(struct mnt_namespace *ns)
 {
 	/* remove from global mount namespace list */
-	if (!is_anon_ns(ns)) {
+	if (!list_empty(&ns->mnt_ns_list)) {
 		mnt_ns_tree_write_lock();
 		rb_erase(&ns->mnt_ns_tree_node, &mnt_ns_tree);
 		list_bidir_del_rcu(&ns->mnt_ns_list);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 58/63] copy_mnt_ns(): use guards
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (60 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 57/65] mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 58/65] copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure Al Viro
                         ` (12 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

* mntput() of rootmnt and pwdmnt done via __free(mntput)
* mnt_ns_tree_add() can be done within namespace_excl scope.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a418555586ef..9e16231d4561 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4164,7 +4164,8 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 		struct user_namespace *user_ns, struct fs_struct *new_fs)
 {
 	struct mnt_namespace *new_ns;
-	struct vfsmount *rootmnt = NULL, *pwdmnt = NULL;
+	struct vfsmount *rootmnt __free(mntput) = NULL;
+	struct vfsmount *pwdmnt __free(mntput) = NULL;
 	struct mount *p, *q;
 	struct mount *old;
 	struct mount *new;
@@ -4183,7 +4184,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 	if (IS_ERR(new_ns))
 		return new_ns;
 
-	namespace_lock();
+	guard(namespace_excl)();
 	/* First pass: copy the tree topology */
 	copy_flags = CL_COPY_UNBINDABLE | CL_EXPIRE;
 	if (user_ns != ns->user_ns)
@@ -4191,13 +4192,11 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 	new = copy_tree(old, old->mnt.mnt_root, copy_flags);
 	if (IS_ERR(new)) {
 		emptied_ns = new_ns;
-		namespace_unlock();
 		return ERR_CAST(new);
 	}
 	if (user_ns != ns->user_ns) {
-		lock_mount_hash();
+		guard(mount_writer)();
 		lock_mnt_tree(new);
-		unlock_mount_hash();
 	}
 	new_ns->root = new;
 
@@ -4229,14 +4228,6 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 		while (p->mnt.mnt_root != q->mnt.mnt_root)
 			p = next_mnt(skip_mnt_tree(p), old);
 	}
-	namespace_unlock();
-
-	if (rootmnt)
-		mntput(rootmnt);
-	if (pwdmnt)
-		mntput(pwdmnt);
-
-	mnt_ns_tree_add(new_ns);
 	return new_ns;
 }
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 58/65] copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (61 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 58/63] copy_mnt_ns(): use guards Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 59/65] copy_mnt_ns(): use guards Al Viro
                         ` (11 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Now that free_mnt_ns() works prior to mnt_ns_tree_add(), there's no need for
an open-coded analogue free_mnt_ns() there - yes, we do avoid one call_rcu()
use per failing call of clone() or unshare(), if they fail due to OOM in that
particular spot, but it's not really worth bothering.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index c175536cc7b5..0cd62478ff36 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4190,10 +4190,8 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 		copy_flags |= CL_SLAVE;
 	new = copy_tree(old, old->mnt.mnt_root, copy_flags);
 	if (IS_ERR(new)) {
+		emptied_ns = new_ns;
 		namespace_unlock();
-		ns_free_inum(&new_ns->ns);
-		dec_mnt_namespaces(new_ns->ucounts);
-		mnt_ns_release(new_ns);
 		return ERR_CAST(new);
 	}
 	if (user_ns != ns->user_ns) {
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 59/65] copy_mnt_ns(): use guards
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (62 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 58/65] copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 59/63] simplify the callers of mnt_unhold_writers() Al Viro
                         ` (10 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

* mntput() of rootmnt and pwdmnt done via __free(mntput)
* mnt_ns_tree_add() can be done within namespace_excl scope.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 0cd62478ff36..3bb9f7ac4be6 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4164,7 +4164,8 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 		struct user_namespace *user_ns, struct fs_struct *new_fs)
 {
 	struct mnt_namespace *new_ns;
-	struct vfsmount *rootmnt = NULL, *pwdmnt = NULL;
+	struct vfsmount *rootmnt __free(mntput) = NULL;
+	struct vfsmount *pwdmnt __free(mntput) = NULL;
 	struct mount *p, *q;
 	struct mount *old;
 	struct mount *new;
@@ -4183,7 +4184,7 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 	if (IS_ERR(new_ns))
 		return new_ns;
 
-	namespace_lock();
+	guard(namespace_excl)();
 	/* First pass: copy the tree topology */
 	copy_flags = CL_COPY_UNBINDABLE | CL_EXPIRE;
 	if (user_ns != ns->user_ns)
@@ -4191,13 +4192,11 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 	new = copy_tree(old, old->mnt.mnt_root, copy_flags);
 	if (IS_ERR(new)) {
 		emptied_ns = new_ns;
-		namespace_unlock();
 		return ERR_CAST(new);
 	}
 	if (user_ns != ns->user_ns) {
-		lock_mount_hash();
+		guard(mount_writer)();
 		lock_mnt_tree(new);
-		unlock_mount_hash();
 	}
 	new_ns->root = new;
 
@@ -4229,13 +4228,6 @@ struct mnt_namespace *copy_mnt_ns(unsigned long flags, struct mnt_namespace *ns,
 		while (p->mnt.mnt_root != q->mnt.mnt_root)
 			p = next_mnt(skip_mnt_tree(p), old);
 	}
-	namespace_unlock();
-
-	if (rootmnt)
-		mntput(rootmnt);
-	if (pwdmnt)
-		mntput(pwdmnt);
-
 	mnt_ns_tree_add(new_ns);
 	return new_ns;
 }
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 59/63] simplify the callers of mnt_unhold_writers()
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (63 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 59/65] copy_mnt_ns(): use guards Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 60/63] setup_mnt(): primitive for connecting a mount to filesystem Al Viro
                         ` (9 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

The logics in cleanup on failure in mount_setattr_prepare() is simplified
by having the mnt_hold_writers() failure followed by advancing m to the
next node in the tree before leaving the loop.

And since all calls are preceded by the same check that flag has been set
and the function is inlined, let's just shift the check into it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 34 ++++++++++------------------------
 1 file changed, 10 insertions(+), 24 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 9e16231d4561..d8df1046e2f9 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -714,13 +714,14 @@ static inline int mnt_hold_writers(struct mount *mnt)
  * Stop preventing write access to @mnt allowing callers to gain write access
  * to @mnt again.
  *
- * This function can only be called after a successful call to
- * mnt_hold_writers().
+ * This function can only be called after a call to mnt_hold_writers().
  *
  * Context: This function expects lock_mount_hash() to be held.
  */
 static inline void mnt_unhold_writers(struct mount *mnt)
 {
+	if (!(mnt->mnt_flags & MNT_WRITE_HOLD))
+		return;
 	/*
 	 * MNT_READONLY must become visible before ~MNT_WRITE_HOLD, so writers
 	 * that become unheld will see MNT_READONLY.
@@ -4773,8 +4774,10 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
 
 		if (!mnt_allow_writers(kattr, m)) {
 			err = mnt_hold_writers(m);
-			if (err)
+			if (err) {
+				m = next_mnt(m, mnt);
 				break;
+			}
 		}
 
 		if (!(kattr->kflags & MOUNT_KATTR_RECURSE))
@@ -4782,25 +4785,9 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
 	}
 
 	if (err) {
-		struct mount *p;
-
-		/*
-		 * If we had to call mnt_hold_writers() MNT_WRITE_HOLD will
-		 * be set in @mnt_flags. The loop unsets MNT_WRITE_HOLD for all
-		 * mounts and needs to take care to include the first mount.
-		 */
-		for (p = mnt; p; p = next_mnt(p, mnt)) {
-			/* If we had to hold writers unblock them. */
-			if (p->mnt.mnt_flags & MNT_WRITE_HOLD)
-				mnt_unhold_writers(p);
-
-			/*
-			 * We're done once the first mount we changed got
-			 * MNT_WRITE_HOLD unset.
-			 */
-			if (p == m)
-				break;
-		}
+		/* undo all mnt_hold_writers() we'd done */
+		for (struct mount *p = mnt; p != m; p = next_mnt(p, mnt))
+			mnt_unhold_writers(p);
 	}
 	return err;
 }
@@ -4831,8 +4818,7 @@ static void mount_setattr_commit(struct mount_kattr *kattr, struct mount *mnt)
 		WRITE_ONCE(m->mnt.mnt_flags, flags);
 
 		/* If we had to hold writers unblock them. */
-		if (m->mnt.mnt_flags & MNT_WRITE_HOLD)
-			mnt_unhold_writers(m);
+		mnt_unhold_writers(m);
 
 		if (kattr->propagation)
 			change_mnt_propagation(m, kattr->propagation);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 60/63] setup_mnt(): primitive for connecting a mount to filesystem
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (64 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 59/63] simplify the callers of mnt_unhold_writers() Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 60/65] simplify the callers of mnt_unhold_writers() Al Viro
                         ` (8 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Take the identical logics in vfs_create_mount() and clone_mnt() into
a new helper that takes an empty struct mount and attaches it to
given dentry (sub)tree.

Should be called once in the lifetime of every mount, prior to making
it visible in any data structures.

After that point ->mnt_root and ->mnt_sb never change; ->mnt_root
is a counting reference to dentry and ->mnt_sb - an active reference
to superblock.

Mount remains associated with that dentry tree all the way until
the call of cleanup_mnt(), when the refcount eventually drops
to zero.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index d8df1046e2f9..c769fc4051e0 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1196,6 +1196,21 @@ static void commit_tree(struct mount *mnt)
 	touch_mnt_namespace(n);
 }
 
+static void setup_mnt(struct mount *m, struct dentry *root)
+{
+	struct super_block *s = root->d_sb;
+
+	atomic_inc(&s->s_active);
+	m->mnt.mnt_sb = s;
+	m->mnt.mnt_root = dget(root);
+	m->mnt_mountpoint = m->mnt.mnt_root;
+	m->mnt_parent = m;
+
+	lock_mount_hash();
+	list_add_tail(&m->mnt_instance, &s->s_mounts);
+	unlock_mount_hash();
+}
+
 /**
  * vfs_create_mount - Create a mount for a configured superblock
  * @fc: The configuration context with the superblock attached
@@ -1219,15 +1234,8 @@ struct vfsmount *vfs_create_mount(struct fs_context *fc)
 	if (fc->sb_flags & SB_KERNMOUNT)
 		mnt->mnt.mnt_flags = MNT_INTERNAL;
 
-	atomic_inc(&fc->root->d_sb->s_active);
-	mnt->mnt.mnt_sb		= fc->root->d_sb;
-	mnt->mnt.mnt_root	= dget(fc->root);
-	mnt->mnt_mountpoint	= mnt->mnt.mnt_root;
-	mnt->mnt_parent		= mnt;
+	setup_mnt(mnt, fc->root);
 
-	lock_mount_hash();
-	list_add_tail(&mnt->mnt_instance, &mnt->mnt.mnt_sb->s_mounts);
-	unlock_mount_hash();
 	return &mnt->mnt;
 }
 EXPORT_SYMBOL(vfs_create_mount);
@@ -1285,7 +1293,6 @@ EXPORT_SYMBOL_GPL(vfs_kern_mount);
 static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 					int flag)
 {
-	struct super_block *sb = old->mnt.mnt_sb;
 	struct mount *mnt;
 	int err;
 
@@ -1310,16 +1317,9 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 	if (mnt->mnt_group_id)
 		set_mnt_shared(mnt);
 
-	atomic_inc(&sb->s_active);
 	mnt->mnt.mnt_idmap = mnt_idmap_get(mnt_idmap(&old->mnt));
 
-	mnt->mnt.mnt_sb = sb;
-	mnt->mnt.mnt_root = dget(root);
-	mnt->mnt_mountpoint = mnt->mnt.mnt_root;
-	mnt->mnt_parent = mnt;
-	lock_mount_hash();
-	list_add_tail(&mnt->mnt_instance, &sb->s_mounts);
-	unlock_mount_hash();
+	setup_mnt(mnt, root);
 
 	if (flag & CL_PRIVATE)	// we are done with it
 		return mnt;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 60/65] simplify the callers of mnt_unhold_writers()
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (65 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 60/63] setup_mnt(): primitive for connecting a mount to filesystem Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 61/63] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags Al Viro
                         ` (7 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

The logics in cleanup on failure in mount_setattr_prepare() is simplified
by having the mnt_hold_writers() failure followed by advancing m to the
next node in the tree before leaving the loop.

And since all calls are preceded by the same check that flag has been set
and the function is inlined, let's just shift the check into it.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 34 ++++++++++------------------------
 1 file changed, 10 insertions(+), 24 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 3bb9f7ac4be6..b4d287c0af4a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -714,13 +714,14 @@ static inline int mnt_hold_writers(struct mount *mnt)
  * Stop preventing write access to @mnt allowing callers to gain write access
  * to @mnt again.
  *
- * This function can only be called after a successful call to
- * mnt_hold_writers().
+ * This function can only be called after a call to mnt_hold_writers().
  *
  * Context: This function expects lock_mount_hash() to be held.
  */
 static inline void mnt_unhold_writers(struct mount *mnt)
 {
+	if (!(mnt->mnt_flags & MNT_WRITE_HOLD))
+		return;
 	/*
 	 * MNT_READONLY must become visible before ~MNT_WRITE_HOLD, so writers
 	 * that become unheld will see MNT_READONLY.
@@ -4774,8 +4775,10 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
 
 		if (!mnt_allow_writers(kattr, m)) {
 			err = mnt_hold_writers(m);
-			if (err)
+			if (err) {
+				m = next_mnt(m, mnt);
 				break;
+			}
 		}
 
 		if (!(kattr->kflags & MOUNT_KATTR_RECURSE))
@@ -4783,25 +4786,9 @@ static int mount_setattr_prepare(struct mount_kattr *kattr, struct mount *mnt)
 	}
 
 	if (err) {
-		struct mount *p;
-
-		/*
-		 * If we had to call mnt_hold_writers() MNT_WRITE_HOLD will
-		 * be set in @mnt_flags. The loop unsets MNT_WRITE_HOLD for all
-		 * mounts and needs to take care to include the first mount.
-		 */
-		for (p = mnt; p; p = next_mnt(p, mnt)) {
-			/* If we had to hold writers unblock them. */
-			if (p->mnt.mnt_flags & MNT_WRITE_HOLD)
-				mnt_unhold_writers(p);
-
-			/*
-			 * We're done once the first mount we changed got
-			 * MNT_WRITE_HOLD unset.
-			 */
-			if (p == m)
-				break;
-		}
+		/* undo all mnt_hold_writers() we'd done */
+		for (struct mount *p = mnt; p != m; p = next_mnt(p, mnt))
+			mnt_unhold_writers(p);
 	}
 	return err;
 }
@@ -4832,8 +4819,7 @@ static void mount_setattr_commit(struct mount_kattr *kattr, struct mount *mnt)
 		WRITE_ONCE(m->mnt.mnt_flags, flags);
 
 		/* If we had to hold writers unblock them. */
-		if (m->mnt.mnt_flags & MNT_WRITE_HOLD)
-			mnt_unhold_writers(m);
+		mnt_unhold_writers(m);
 
 		if (kattr->propagation)
 			change_mnt_propagation(m, kattr->propagation);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 61/63] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (66 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 60/65] simplify the callers of mnt_unhold_writers() Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 61/65] setup_mnt(): primitive for connecting a mount to filesystem Al Viro
                         ` (6 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

We have an unpleasant wart in accessibility rules for struct mount.  There
are per-superblock lists of mounts, used by sb_prepare_remount_readonly()
to check if any of those is currently claimed for write access and to
block further attempts to get write access on those until we are done.

As soon as it is attached to a filesystem, mount becomes reachable
via that list.  Only sb_prepare_remount_readonly() traverses it and
it only accesses a few members of struct mount.  Unfortunately,
->mnt_flags is one of those and it is modified - MNT_WRITE_HOLD set
and then cleared.  It is done under mount_lock, so from the locking
rules POV everything's fine.

However, it has easily overlooked implications - once mount has been
attached to a filesystem, it has to be treated as globally visible.
In particular, initializing ->mnt_flags *must* be done either prior
to that point or under mount_lock.  All other members are still
private at that point.

Life gets simpler if we move that bit (and that's *all* that can get
touched by access via this list) out of ->mnt_flags.  It's not even
hard to do - currently the list is implemented as list_head one,
anchored in super_block->s_mounts and linked via mount->mnt_instance.

As the first step, switch it to hlist-like open-coded structure -
address of the first mount in the set is stored in ->s_mounts
and ->mnt_instance replaced with ->mnt_next_for_sb and ->mnt_pprev_for_sb -
the former either NULL or pointing to the next mount in set, the
latter - address of either ->s_mounts or ->mnt_next_for_sb in the
previous element of the set.

In the next commit we'll steal the LSB of ->mnt_pprev_for_sb as
replacement for MNT_WRITE_HOLD.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/mount.h         |  4 +++-
 fs/namespace.c     | 38 +++++++++++++++++++++++++++++---------
 fs/super.c         |  3 +--
 include/linux/fs.h |  4 +++-
 4 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index 04d0eadc4c10..b208f69f69d7 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -64,7 +64,9 @@ struct mount {
 #endif
 	struct list_head mnt_mounts;	/* list of children, anchored here */
 	struct list_head mnt_child;	/* and going through their mnt_child */
-	struct list_head mnt_instance;	/* mount instance on sb->s_mounts */
+	struct mount *mnt_next_for_sb;	/* the next two fields are hlist_node, */
+	struct mount * __aligned(1) *mnt_pprev_for_sb;
+					/* except that LSB of pprev will be stolen */
 	const char *mnt_devname;	/* Name of device e.g. /dev/dsk/hda1 */
 	struct list_head mnt_list;
 	struct list_head mnt_expire;	/* link in fs-specific expiry list */
diff --git a/fs/namespace.c b/fs/namespace.c
index c769fc4051e0..06be5b65b559 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -730,6 +730,27 @@ static inline void mnt_unhold_writers(struct mount *mnt)
 	mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
 }
 
+static inline void mnt_del_instance(struct mount *m)
+{
+	struct mount **p = m->mnt_pprev_for_sb;
+	struct mount *next = m->mnt_next_for_sb;
+
+	if (next)
+		next->mnt_pprev_for_sb = p;
+	*p = next;
+}
+
+static inline void mnt_add_instance(struct mount *m, struct super_block *s)
+{
+	struct mount *first = s->s_mounts;
+
+	if (first)
+		first->mnt_pprev_for_sb = &m->mnt_next_for_sb;
+	m->mnt_next_for_sb = first;
+	m->mnt_pprev_for_sb = &s->s_mounts;
+	s->s_mounts = m;
+}
+
 static int mnt_make_readonly(struct mount *mnt)
 {
 	int ret;
@@ -743,7 +764,6 @@ static int mnt_make_readonly(struct mount *mnt)
 
 int sb_prepare_remount_readonly(struct super_block *sb)
 {
-	struct mount *mnt;
 	int err = 0;
 
 	/* Racy optimization.  Recheck the counter under MNT_WRITE_HOLD */
@@ -751,9 +771,9 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 		return -EBUSY;
 
 	lock_mount_hash();
-	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
-		if (!(mnt->mnt.mnt_flags & MNT_READONLY)) {
-			err = mnt_hold_writers(mnt);
+	for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {
+		if (!(m->mnt.mnt_flags & MNT_READONLY)) {
+			err = mnt_hold_writers(m);
 			if (err)
 				break;
 		}
@@ -763,9 +783,9 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 
 	if (!err)
 		sb_start_ro_state_change(sb);
-	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
-		if (mnt->mnt.mnt_flags & MNT_WRITE_HOLD)
-			mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
+	for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {
+		if (m->mnt.mnt_flags & MNT_WRITE_HOLD)
+			m->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
 	}
 	unlock_mount_hash();
 
@@ -1207,7 +1227,7 @@ static void setup_mnt(struct mount *m, struct dentry *root)
 	m->mnt_parent = m;
 
 	lock_mount_hash();
-	list_add_tail(&m->mnt_instance, &s->s_mounts);
+	mnt_add_instance(m, s);
 	unlock_mount_hash();
 }
 
@@ -1425,7 +1445,7 @@ static void mntput_no_expire(struct mount *mnt)
 	mnt->mnt.mnt_flags |= MNT_DOOMED;
 	rcu_read_unlock();
 
-	list_del(&mnt->mnt_instance);
+	mnt_del_instance(mnt);
 	if (unlikely(!list_empty(&mnt->mnt_expire)))
 		list_del(&mnt->mnt_expire);
 
diff --git a/fs/super.c b/fs/super.c
index 7f876f32343a..3b0f49e1b817 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -323,7 +323,6 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	if (!s)
 		return NULL;
 
-	INIT_LIST_HEAD(&s->s_mounts);
 	s->s_user_ns = get_user_ns(user_ns);
 	init_rwsem(&s->s_umount);
 	lockdep_set_class(&s->s_umount, &type->s_umount_key);
@@ -408,7 +407,7 @@ static void __put_super(struct super_block *s)
 		list_del_init(&s->s_list);
 		WARN_ON(s->s_dentry_lru.node);
 		WARN_ON(s->s_inode_lru.node);
-		WARN_ON(!list_empty(&s->s_mounts));
+		WARN_ON(s->s_mounts);
 		call_rcu(&s->rcu, destroy_super_rcu);
 	}
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d7ab4f96d705..0e9c7f1460dc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1324,6 +1324,8 @@ struct sb_writers {
 	struct percpu_rw_semaphore	rw_sem[SB_FREEZE_LEVELS];
 };
 
+struct mount;
+
 struct super_block {
 	struct list_head	s_list;		/* Keep this first */
 	dev_t			s_dev;		/* search index; _not_ kdev_t */
@@ -1358,7 +1360,7 @@ struct super_block {
 	__u16 s_encoding_flags;
 #endif
 	struct hlist_bl_head	s_roots;	/* alternate root dentries for NFS */
-	struct list_head	s_mounts;	/* list of mounts; _not_ for fs use */
+	struct mount		*s_mounts;	/* list of mounts; _not_ for fs use */
 	struct block_device	*s_bdev;	/* can go away once we use an accessor for @s_bdev_file */
 	struct file		*s_bdev_file;
 	struct backing_dev_info *s_bdi;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 61/65] setup_mnt(): primitive for connecting a mount to filesystem
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (67 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 61/63] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 62/65] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags Al Viro
                         ` (5 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Take the identical logics in vfs_create_mount() and clone_mnt() into
a new helper that takes an empty struct mount and attaches it to
given dentry (sub)tree.

Should be called once in the lifetime of every mount, prior to making
it visible in any data structures.

After that point ->mnt_root and ->mnt_sb never change; ->mnt_root
is a counting reference to dentry and ->mnt_sb - an active reference
to superblock.

Mount remains associated with that dentry tree all the way until
the call of cleanup_mnt(), when the refcount eventually drops
to zero.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index b4d287c0af4a..b7c317c23f69 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1196,6 +1196,21 @@ static void commit_tree(struct mount *mnt)
 	touch_mnt_namespace(n);
 }
 
+static void setup_mnt(struct mount *m, struct dentry *root)
+{
+	struct super_block *s = root->d_sb;
+
+	atomic_inc(&s->s_active);
+	m->mnt.mnt_sb = s;
+	m->mnt.mnt_root = dget(root);
+	m->mnt_mountpoint = m->mnt.mnt_root;
+	m->mnt_parent = m;
+
+	lock_mount_hash();
+	list_add_tail(&m->mnt_instance, &s->s_mounts);
+	unlock_mount_hash();
+}
+
 /**
  * vfs_create_mount - Create a mount for a configured superblock
  * @fc: The configuration context with the superblock attached
@@ -1219,15 +1234,8 @@ struct vfsmount *vfs_create_mount(struct fs_context *fc)
 	if (fc->sb_flags & SB_KERNMOUNT)
 		mnt->mnt.mnt_flags = MNT_INTERNAL;
 
-	atomic_inc(&fc->root->d_sb->s_active);
-	mnt->mnt.mnt_sb		= fc->root->d_sb;
-	mnt->mnt.mnt_root	= dget(fc->root);
-	mnt->mnt_mountpoint	= mnt->mnt.mnt_root;
-	mnt->mnt_parent		= mnt;
+	setup_mnt(mnt, fc->root);
 
-	lock_mount_hash();
-	list_add_tail(&mnt->mnt_instance, &mnt->mnt.mnt_sb->s_mounts);
-	unlock_mount_hash();
 	return &mnt->mnt;
 }
 EXPORT_SYMBOL(vfs_create_mount);
@@ -1285,7 +1293,6 @@ EXPORT_SYMBOL_GPL(vfs_kern_mount);
 static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 					int flag)
 {
-	struct super_block *sb = old->mnt.mnt_sb;
 	struct mount *mnt;
 	int err;
 
@@ -1310,16 +1317,9 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 	if (mnt->mnt_group_id)
 		set_mnt_shared(mnt);
 
-	atomic_inc(&sb->s_active);
 	mnt->mnt.mnt_idmap = mnt_idmap_get(mnt_idmap(&old->mnt));
 
-	mnt->mnt.mnt_sb = sb;
-	mnt->mnt.mnt_root = dget(root);
-	mnt->mnt_mountpoint = mnt->mnt.mnt_root;
-	mnt->mnt_parent = mnt;
-	lock_mount_hash();
-	list_add_tail(&mnt->mnt_instance, &sb->s_mounts);
-	unlock_mount_hash();
+	setup_mnt(mnt, root);
 
 	if (flag & CL_PRIVATE)	// we are done with it
 		return mnt;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 62/65] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (68 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 61/65] setup_mnt(): primitive for connecting a mount to filesystem Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 62/63] struct mount: relocate MNT_WRITE_HOLD bit Al Viro
                         ` (4 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

We have an unpleasant wart in accessibility rules for struct mount.  There
are per-superblock lists of mounts, used by sb_prepare_remount_readonly()
to check if any of those is currently claimed for write access and to
block further attempts to get write access on those until we are done.

As soon as it is attached to a filesystem, mount becomes reachable
via that list.  Only sb_prepare_remount_readonly() traverses it and
it only accesses a few members of struct mount.  Unfortunately,
->mnt_flags is one of those and it is modified - MNT_WRITE_HOLD set
and then cleared.  It is done under mount_lock, so from the locking
rules POV everything's fine.

However, it has easily overlooked implications - once mount has been
attached to a filesystem, it has to be treated as globally visible.
In particular, initializing ->mnt_flags *must* be done either prior
to that point or under mount_lock.  All other members are still
private at that point.

Life gets simpler if we move that bit (and that's *all* that can get
touched by access via this list) out of ->mnt_flags.  It's not even
hard to do - currently the list is implemented as list_head one,
anchored in super_block->s_mounts and linked via mount->mnt_instance.

As the first step, switch it to hlist-like open-coded structure -
address of the first mount in the set is stored in ->s_mounts
and ->mnt_instance replaced with ->mnt_next_for_sb and ->mnt_pprev_for_sb -
the former either NULL or pointing to the next mount in set, the
latter - address of either ->s_mounts or ->mnt_next_for_sb in the
previous element of the set.

In the next commit we'll steal the LSB of ->mnt_pprev_for_sb as
replacement for MNT_WRITE_HOLD.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/mount.h         |  4 +++-
 fs/namespace.c     | 38 +++++++++++++++++++++++++++++---------
 fs/super.c         |  3 +--
 include/linux/fs.h |  4 +++-
 4 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index 04d0eadc4c10..b208f69f69d7 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -64,7 +64,9 @@ struct mount {
 #endif
 	struct list_head mnt_mounts;	/* list of children, anchored here */
 	struct list_head mnt_child;	/* and going through their mnt_child */
-	struct list_head mnt_instance;	/* mount instance on sb->s_mounts */
+	struct mount *mnt_next_for_sb;	/* the next two fields are hlist_node, */
+	struct mount * __aligned(1) *mnt_pprev_for_sb;
+					/* except that LSB of pprev will be stolen */
 	const char *mnt_devname;	/* Name of device e.g. /dev/dsk/hda1 */
 	struct list_head mnt_list;
 	struct list_head mnt_expire;	/* link in fs-specific expiry list */
diff --git a/fs/namespace.c b/fs/namespace.c
index b7c317c23f69..eb1b557e9f6d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -730,6 +730,27 @@ static inline void mnt_unhold_writers(struct mount *mnt)
 	mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
 }
 
+static inline void mnt_del_instance(struct mount *m)
+{
+	struct mount **p = m->mnt_pprev_for_sb;
+	struct mount *next = m->mnt_next_for_sb;
+
+	if (next)
+		next->mnt_pprev_for_sb = p;
+	*p = next;
+}
+
+static inline void mnt_add_instance(struct mount *m, struct super_block *s)
+{
+	struct mount *first = s->s_mounts;
+
+	if (first)
+		first->mnt_pprev_for_sb = &m->mnt_next_for_sb;
+	m->mnt_next_for_sb = first;
+	m->mnt_pprev_for_sb = &s->s_mounts;
+	s->s_mounts = m;
+}
+
 static int mnt_make_readonly(struct mount *mnt)
 {
 	int ret;
@@ -743,7 +764,6 @@ static int mnt_make_readonly(struct mount *mnt)
 
 int sb_prepare_remount_readonly(struct super_block *sb)
 {
-	struct mount *mnt;
 	int err = 0;
 
 	/* Racy optimization.  Recheck the counter under MNT_WRITE_HOLD */
@@ -751,9 +771,9 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 		return -EBUSY;
 
 	lock_mount_hash();
-	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
-		if (!(mnt->mnt.mnt_flags & MNT_READONLY)) {
-			err = mnt_hold_writers(mnt);
+	for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {
+		if (!(m->mnt.mnt_flags & MNT_READONLY)) {
+			err = mnt_hold_writers(m);
 			if (err)
 				break;
 		}
@@ -763,9 +783,9 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 
 	if (!err)
 		sb_start_ro_state_change(sb);
-	list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
-		if (mnt->mnt.mnt_flags & MNT_WRITE_HOLD)
-			mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
+	for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {
+		if (m->mnt.mnt_flags & MNT_WRITE_HOLD)
+			m->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
 	}
 	unlock_mount_hash();
 
@@ -1207,7 +1227,7 @@ static void setup_mnt(struct mount *m, struct dentry *root)
 	m->mnt_parent = m;
 
 	lock_mount_hash();
-	list_add_tail(&m->mnt_instance, &s->s_mounts);
+	mnt_add_instance(m, s);
 	unlock_mount_hash();
 }
 
@@ -1425,7 +1445,7 @@ static void mntput_no_expire(struct mount *mnt)
 	mnt->mnt.mnt_flags |= MNT_DOOMED;
 	rcu_read_unlock();
 
-	list_del(&mnt->mnt_instance);
+	mnt_del_instance(mnt);
 	if (unlikely(!list_empty(&mnt->mnt_expire)))
 		list_del(&mnt->mnt_expire);
 
diff --git a/fs/super.c b/fs/super.c
index 7f876f32343a..3b0f49e1b817 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -323,7 +323,6 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	if (!s)
 		return NULL;
 
-	INIT_LIST_HEAD(&s->s_mounts);
 	s->s_user_ns = get_user_ns(user_ns);
 	init_rwsem(&s->s_umount);
 	lockdep_set_class(&s->s_umount, &type->s_umount_key);
@@ -408,7 +407,7 @@ static void __put_super(struct super_block *s)
 		list_del_init(&s->s_list);
 		WARN_ON(s->s_dentry_lru.node);
 		WARN_ON(s->s_inode_lru.node);
-		WARN_ON(!list_empty(&s->s_mounts));
+		WARN_ON(s->s_mounts);
 		call_rcu(&s->rcu, destroy_super_rcu);
 	}
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d7ab4f96d705..0e9c7f1460dc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1324,6 +1324,8 @@ struct sb_writers {
 	struct percpu_rw_semaphore	rw_sem[SB_FREEZE_LEVELS];
 };
 
+struct mount;
+
 struct super_block {
 	struct list_head	s_list;		/* Keep this first */
 	dev_t			s_dev;		/* search index; _not_ kdev_t */
@@ -1358,7 +1360,7 @@ struct super_block {
 	__u16 s_encoding_flags;
 #endif
 	struct hlist_bl_head	s_roots;	/* alternate root dentries for NFS */
-	struct list_head	s_mounts;	/* list of mounts; _not_ for fs use */
+	struct mount		*s_mounts;	/* list of mounts; _not_ for fs use */
 	struct block_device	*s_bdev;	/* can go away once we use an accessor for @s_bdev_file */
 	struct file		*s_bdev_file;
 	struct backing_dev_info *s_bdi;
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 62/63] struct mount: relocate MNT_WRITE_HOLD bit
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (69 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 62/65] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 63/65] " Al Viro
                         ` (3 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... from ->mnt_flags to LSB of ->mnt_pprev_for_sb.

This is safe - we always set and clear it within the same mount_lock
scope, so we won't interfere with list operations - traversals are
always forward, so they don't even look at ->mnt_prev_for_sb and
both insertions and removals are in mount_lock scopes of their own,
so that bit will be clear in *all* mount instances during those.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/mount.h            | 25 ++++++++++++++++++++++++-
 fs/namespace.c        | 34 +++++++++++++++++-----------------
 include/linux/mount.h |  3 +--
 3 files changed, 42 insertions(+), 20 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index b208f69f69d7..40cf16544317 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -66,7 +66,8 @@ struct mount {
 	struct list_head mnt_child;	/* and going through their mnt_child */
 	struct mount *mnt_next_for_sb;	/* the next two fields are hlist_node, */
 	struct mount * __aligned(1) *mnt_pprev_for_sb;
-					/* except that LSB of pprev will be stolen */
+					/* except that LSB of pprev is stolen */
+#define WRITE_HOLD 1			/* ... for use by mnt_hold_writers() */
 	const char *mnt_devname;	/* Name of device e.g. /dev/dsk/hda1 */
 	struct list_head mnt_list;
 	struct list_head mnt_expire;	/* link in fs-specific expiry list */
@@ -244,4 +245,26 @@ static inline struct mount *topmost_overmount(struct mount *m)
 	return m;
 }
 
+static inline bool __test_write_hold(struct mount * __aligned(1) *val)
+{
+	return (unsigned long)val & WRITE_HOLD;
+}
+
+static inline bool test_write_hold(const struct mount *m)
+{
+	return __test_write_hold(m->mnt_pprev_for_sb);
+}
+
+static inline void set_write_hold(struct mount *m)
+{
+	m->mnt_pprev_for_sb = (void *)((unsigned long)m->mnt_pprev_for_sb
+				       | WRITE_HOLD);
+}
+
+static inline void clear_write_hold(struct mount *m)
+{
+	m->mnt_pprev_for_sb = (void *)((unsigned long)m->mnt_pprev_for_sb
+				       & ~WRITE_HOLD);
+}
+
 struct mnt_namespace *mnt_ns_from_dentry(struct dentry *dentry);
diff --git a/fs/namespace.c b/fs/namespace.c
index 06be5b65b559..8e6b6523d3e8 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -509,20 +509,20 @@ int mnt_get_write_access(struct vfsmount *m)
 	mnt_inc_writers(mnt);
 	/*
 	 * The store to mnt_inc_writers must be visible before we pass
-	 * MNT_WRITE_HOLD loop below, so that the slowpath can see our
-	 * incremented count after it has set MNT_WRITE_HOLD.
+	 * WRITE_HOLD loop below, so that the slowpath can see our
+	 * incremented count after it has set WRITE_HOLD.
 	 */
 	smp_mb();
 	might_lock(&mount_lock.lock);
-	while (READ_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD) {
+	while (__test_write_hold(READ_ONCE(mnt->mnt_pprev_for_sb))) {
 		if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
 			cpu_relax();
 		} else {
 			/*
 			 * This prevents priority inversion, if the task
-			 * setting MNT_WRITE_HOLD got preempted on a remote
+			 * setting WRITE_HOLD got preempted on a remote
 			 * CPU, and it prevents life lock if the task setting
-			 * MNT_WRITE_HOLD has a lower priority and is bound to
+			 * WRITE_HOLD has a lower priority and is bound to
 			 * the same CPU as the task that is spinning here.
 			 */
 			preempt_enable();
@@ -533,7 +533,7 @@ int mnt_get_write_access(struct vfsmount *m)
 	}
 	/*
 	 * The barrier pairs with the barrier sb_start_ro_state_change() making
-	 * sure that if we see MNT_WRITE_HOLD cleared, we will also see
+	 * sure that if we see WRITE_HOLD cleared, we will also see
 	 * s_readonly_remount set (or even SB_RDONLY / MNT_READONLY flags) in
 	 * mnt_is_readonly() and bail in case we are racing with remount
 	 * read-only.
@@ -672,15 +672,15 @@ EXPORT_SYMBOL(mnt_drop_write_file);
  * @mnt.
  *
  * Context: This function expects lock_mount_hash() to be held serializing
- *          setting MNT_WRITE_HOLD.
+ *          setting WRITE_HOLD.
  * Return: On success 0 is returned.
  *	   On error, -EBUSY is returned.
  */
 static inline int mnt_hold_writers(struct mount *mnt)
 {
-	mnt->mnt.mnt_flags |= MNT_WRITE_HOLD;
+	set_write_hold(mnt);
 	/*
-	 * After storing MNT_WRITE_HOLD, we'll read the counters. This store
+	 * After storing WRITE_HOLD, we'll read the counters. This store
 	 * should be visible before we do.
 	 */
 	smp_mb();
@@ -696,9 +696,9 @@ static inline int mnt_hold_writers(struct mount *mnt)
 	 * sum up each counter, if we read a counter before it is incremented,
 	 * but then read another CPU's count which it has been subsequently
 	 * decremented from -- we would see more decrements than we should.
-	 * MNT_WRITE_HOLD protects against this scenario, because
+	 * WRITE_HOLD protects against this scenario, because
 	 * mnt_want_write first increments count, then smp_mb, then spins on
-	 * MNT_WRITE_HOLD, so it can't be decremented by another CPU while
+	 * WRITE_HOLD, so it can't be decremented by another CPU while
 	 * we're counting up here.
 	 */
 	if (mnt_get_writers(mnt) > 0)
@@ -720,14 +720,14 @@ static inline int mnt_hold_writers(struct mount *mnt)
  */
 static inline void mnt_unhold_writers(struct mount *mnt)
 {
-	if (!(mnt->mnt_flags & MNT_WRITE_HOLD))
+	if (!test_write_hold(mnt))
 		return;
 	/*
-	 * MNT_READONLY must become visible before ~MNT_WRITE_HOLD, so writers
+	 * MNT_READONLY must become visible before ~WRITE_HOLD, so writers
 	 * that become unheld will see MNT_READONLY.
 	 */
 	smp_wmb();
-	mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
+	clear_write_hold(mnt);
 }
 
 static inline void mnt_del_instance(struct mount *m)
@@ -766,7 +766,7 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 {
 	int err = 0;
 
-	/* Racy optimization.  Recheck the counter under MNT_WRITE_HOLD */
+	/* Racy optimization.  Recheck the counter under WRITE_HOLD */
 	if (atomic_long_read(&sb->s_remove_count))
 		return -EBUSY;
 
@@ -784,8 +784,8 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 	if (!err)
 		sb_start_ro_state_change(sb);
 	for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {
-		if (m->mnt.mnt_flags & MNT_WRITE_HOLD)
-			m->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
+		if (test_write_hold(m))
+			clear_write_hold(m);
 	}
 	unlock_mount_hash();
 
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 18e4b97f8a98..85e97b9340ff 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -33,7 +33,6 @@ enum mount_flags {
 	MNT_NOSYMFOLLOW	= 0x80,
 
 	MNT_SHRINKABLE	= 0x100,
-	MNT_WRITE_HOLD	= 0x200,
 
 	MNT_INTERNAL	= 0x4000,
 
@@ -52,7 +51,7 @@ enum mount_flags {
 				  | MNT_READONLY | MNT_NOSYMFOLLOW,
 	MNT_ATIME_MASK = MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME,
 
-	MNT_INTERNAL_FLAGS = MNT_WRITE_HOLD | MNT_INTERNAL | MNT_DOOMED |
+	MNT_INTERNAL_FLAGS = MNT_INTERNAL | MNT_DOOMED |
 			     MNT_SYNC_UMOUNT | MNT_LOCKED
 };
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 63/65] struct mount: relocate MNT_WRITE_HOLD bit
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (70 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 62/63] struct mount: relocate MNT_WRITE_HOLD bit Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 63/63] WRITE_HOLD machinery: no need for to bump mount_lock seqcount Al Viro
                         ` (2 subsequent siblings)
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... from ->mnt_flags to LSB of ->mnt_pprev_for_sb.

This is safe - we always set and clear it within the same mount_lock
scope, so we won't interfere with list operations - traversals are
always forward, so they don't even look at ->mnt_prev_for_sb and
both insertions and removals are in mount_lock scopes of their own,
so that bit will be clear in *all* mount instances during those.

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/mount.h            | 25 ++++++++++++++++++++++++-
 fs/namespace.c        | 34 +++++++++++++++++-----------------
 include/linux/mount.h |  3 +--
 3 files changed, 42 insertions(+), 20 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index b208f69f69d7..40cf16544317 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -66,7 +66,8 @@ struct mount {
 	struct list_head mnt_child;	/* and going through their mnt_child */
 	struct mount *mnt_next_for_sb;	/* the next two fields are hlist_node, */
 	struct mount * __aligned(1) *mnt_pprev_for_sb;
-					/* except that LSB of pprev will be stolen */
+					/* except that LSB of pprev is stolen */
+#define WRITE_HOLD 1			/* ... for use by mnt_hold_writers() */
 	const char *mnt_devname;	/* Name of device e.g. /dev/dsk/hda1 */
 	struct list_head mnt_list;
 	struct list_head mnt_expire;	/* link in fs-specific expiry list */
@@ -244,4 +245,26 @@ static inline struct mount *topmost_overmount(struct mount *m)
 	return m;
 }
 
+static inline bool __test_write_hold(struct mount * __aligned(1) *val)
+{
+	return (unsigned long)val & WRITE_HOLD;
+}
+
+static inline bool test_write_hold(const struct mount *m)
+{
+	return __test_write_hold(m->mnt_pprev_for_sb);
+}
+
+static inline void set_write_hold(struct mount *m)
+{
+	m->mnt_pprev_for_sb = (void *)((unsigned long)m->mnt_pprev_for_sb
+				       | WRITE_HOLD);
+}
+
+static inline void clear_write_hold(struct mount *m)
+{
+	m->mnt_pprev_for_sb = (void *)((unsigned long)m->mnt_pprev_for_sb
+				       & ~WRITE_HOLD);
+}
+
 struct mnt_namespace *mnt_ns_from_dentry(struct dentry *dentry);
diff --git a/fs/namespace.c b/fs/namespace.c
index eb1b557e9f6d..64cbd8e8a1d3 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -509,20 +509,20 @@ int mnt_get_write_access(struct vfsmount *m)
 	mnt_inc_writers(mnt);
 	/*
 	 * The store to mnt_inc_writers must be visible before we pass
-	 * MNT_WRITE_HOLD loop below, so that the slowpath can see our
-	 * incremented count after it has set MNT_WRITE_HOLD.
+	 * WRITE_HOLD loop below, so that the slowpath can see our
+	 * incremented count after it has set WRITE_HOLD.
 	 */
 	smp_mb();
 	might_lock(&mount_lock.lock);
-	while (READ_ONCE(mnt->mnt.mnt_flags) & MNT_WRITE_HOLD) {
+	while (__test_write_hold(READ_ONCE(mnt->mnt_pprev_for_sb))) {
 		if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
 			cpu_relax();
 		} else {
 			/*
 			 * This prevents priority inversion, if the task
-			 * setting MNT_WRITE_HOLD got preempted on a remote
+			 * setting WRITE_HOLD got preempted on a remote
 			 * CPU, and it prevents life lock if the task setting
-			 * MNT_WRITE_HOLD has a lower priority and is bound to
+			 * WRITE_HOLD has a lower priority and is bound to
 			 * the same CPU as the task that is spinning here.
 			 */
 			preempt_enable();
@@ -533,7 +533,7 @@ int mnt_get_write_access(struct vfsmount *m)
 	}
 	/*
 	 * The barrier pairs with the barrier sb_start_ro_state_change() making
-	 * sure that if we see MNT_WRITE_HOLD cleared, we will also see
+	 * sure that if we see WRITE_HOLD cleared, we will also see
 	 * s_readonly_remount set (or even SB_RDONLY / MNT_READONLY flags) in
 	 * mnt_is_readonly() and bail in case we are racing with remount
 	 * read-only.
@@ -672,15 +672,15 @@ EXPORT_SYMBOL(mnt_drop_write_file);
  * @mnt.
  *
  * Context: This function expects lock_mount_hash() to be held serializing
- *          setting MNT_WRITE_HOLD.
+ *          setting WRITE_HOLD.
  * Return: On success 0 is returned.
  *	   On error, -EBUSY is returned.
  */
 static inline int mnt_hold_writers(struct mount *mnt)
 {
-	mnt->mnt.mnt_flags |= MNT_WRITE_HOLD;
+	set_write_hold(mnt);
 	/*
-	 * After storing MNT_WRITE_HOLD, we'll read the counters. This store
+	 * After storing WRITE_HOLD, we'll read the counters. This store
 	 * should be visible before we do.
 	 */
 	smp_mb();
@@ -696,9 +696,9 @@ static inline int mnt_hold_writers(struct mount *mnt)
 	 * sum up each counter, if we read a counter before it is incremented,
 	 * but then read another CPU's count which it has been subsequently
 	 * decremented from -- we would see more decrements than we should.
-	 * MNT_WRITE_HOLD protects against this scenario, because
+	 * WRITE_HOLD protects against this scenario, because
 	 * mnt_want_write first increments count, then smp_mb, then spins on
-	 * MNT_WRITE_HOLD, so it can't be decremented by another CPU while
+	 * WRITE_HOLD, so it can't be decremented by another CPU while
 	 * we're counting up here.
 	 */
 	if (mnt_get_writers(mnt) > 0)
@@ -720,14 +720,14 @@ static inline int mnt_hold_writers(struct mount *mnt)
  */
 static inline void mnt_unhold_writers(struct mount *mnt)
 {
-	if (!(mnt->mnt_flags & MNT_WRITE_HOLD))
+	if (!test_write_hold(mnt))
 		return;
 	/*
-	 * MNT_READONLY must become visible before ~MNT_WRITE_HOLD, so writers
+	 * MNT_READONLY must become visible before ~WRITE_HOLD, so writers
 	 * that become unheld will see MNT_READONLY.
 	 */
 	smp_wmb();
-	mnt->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
+	clear_write_hold(mnt);
 }
 
 static inline void mnt_del_instance(struct mount *m)
@@ -766,7 +766,7 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 {
 	int err = 0;
 
-	/* Racy optimization.  Recheck the counter under MNT_WRITE_HOLD */
+	/* Racy optimization.  Recheck the counter under WRITE_HOLD */
 	if (atomic_long_read(&sb->s_remove_count))
 		return -EBUSY;
 
@@ -784,8 +784,8 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 	if (!err)
 		sb_start_ro_state_change(sb);
 	for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {
-		if (m->mnt.mnt_flags & MNT_WRITE_HOLD)
-			m->mnt.mnt_flags &= ~MNT_WRITE_HOLD;
+		if (test_write_hold(m))
+			clear_write_hold(m);
 	}
 	unlock_mount_hash();
 
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 18e4b97f8a98..85e97b9340ff 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -33,7 +33,6 @@ enum mount_flags {
 	MNT_NOSYMFOLLOW	= 0x80,
 
 	MNT_SHRINKABLE	= 0x100,
-	MNT_WRITE_HOLD	= 0x200,
 
 	MNT_INTERNAL	= 0x4000,
 
@@ -52,7 +51,7 @@ enum mount_flags {
 				  | MNT_READONLY | MNT_NOSYMFOLLOW,
 	MNT_ATIME_MASK = MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME,
 
-	MNT_INTERNAL_FLAGS = MNT_WRITE_HOLD | MNT_INTERNAL | MNT_DOOMED |
+	MNT_INTERNAL_FLAGS = MNT_INTERNAL | MNT_DOOMED |
 			     MNT_SYNC_UMOUNT | MNT_LOCKED
 };
 
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 63/63] WRITE_HOLD machinery: no need for to bump mount_lock seqcount
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (71 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 63/65] " Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 64/65] " Al Viro
  2025-09-03  4:55       ` [PATCH v3 65/65] constify {__,}mnt_is_readonly() Al Viro
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... neither for insertion into the list of instances, nor for
mnt_{un,}hold_writers(), nor for mnt_get_write_access() deciding
to be nice to RT during a busy-wait loop - all of that only needs
the spinlock side of mount_lock.

IOW, it's mount_locked_reader, not mount_writer.

Clarify the comment re locking rules for mnt_unhold_writers() - it's
not just that mount_lock needs to be held when calling that, it must
have been held all along since the matching mnt_hold_writers().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 8e6b6523d3e8..8f0900857822 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -526,8 +526,8 @@ int mnt_get_write_access(struct vfsmount *m)
 			 * the same CPU as the task that is spinning here.
 			 */
 			preempt_enable();
-			lock_mount_hash();
-			unlock_mount_hash();
+			read_seqlock_excl(&mount_lock);
+			read_sequnlock_excl(&mount_lock);
 			preempt_disable();
 		}
 	}
@@ -671,7 +671,7 @@ EXPORT_SYMBOL(mnt_drop_write_file);
  * a call to mnt_unhold_writers() in order to stop preventing write access to
  * @mnt.
  *
- * Context: This function expects lock_mount_hash() to be held serializing
+ * Context: This function expects to be in mount_locked_reader scope serializing
  *          setting WRITE_HOLD.
  * Return: On success 0 is returned.
  *	   On error, -EBUSY is returned.
@@ -716,7 +716,8 @@ static inline int mnt_hold_writers(struct mount *mnt)
  *
  * This function can only be called after a call to mnt_hold_writers().
  *
- * Context: This function expects lock_mount_hash() to be held.
+ * Context: This function expects to be in the same mount_locked_reader scope
+ * as the matching mnt_hold_writers().
  */
 static inline void mnt_unhold_writers(struct mount *mnt)
 {
@@ -770,7 +771,8 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 	if (atomic_long_read(&sb->s_remove_count))
 		return -EBUSY;
 
-	lock_mount_hash();
+	guard(mount_locked_reader)();
+
 	for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {
 		if (!(m->mnt.mnt_flags & MNT_READONLY)) {
 			err = mnt_hold_writers(m);
@@ -787,7 +789,6 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 		if (test_write_hold(m))
 			clear_write_hold(m);
 	}
-	unlock_mount_hash();
 
 	return err;
 }
@@ -1226,9 +1227,8 @@ static void setup_mnt(struct mount *m, struct dentry *root)
 	m->mnt_mountpoint = m->mnt.mnt_root;
 	m->mnt_parent = m;
 
-	lock_mount_hash();
+	guard(mount_locked_reader)();
 	mnt_add_instance(m, s);
-	unlock_mount_hash();
 }
 
 /**
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 64/65] WRITE_HOLD machinery: no need for to bump mount_lock seqcount
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (72 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 63/63] WRITE_HOLD machinery: no need for to bump mount_lock seqcount Al Viro
@ 2025-09-03  4:55       ` Al Viro
  2025-09-03  4:55       ` [PATCH v3 65/65] constify {__,}mnt_is_readonly() Al Viro
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

... neither for insertion into the list of instances, nor for
mnt_{un,}hold_writers(), nor for mnt_get_write_access() deciding
to be nice to RT during a busy-wait loop - all of that only needs
the spinlock side of mount_lock.

IOW, it's mount_locked_reader, not mount_writer.

Clarify the comment re locking rules for mnt_unhold_writers() - it's
not just that mount_lock needs to be held when calling that, it must
have been held all along since the matching mnt_hold_writers().

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 64cbd8e8a1d3..9eef4ca6d36a 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -526,8 +526,8 @@ int mnt_get_write_access(struct vfsmount *m)
 			 * the same CPU as the task that is spinning here.
 			 */
 			preempt_enable();
-			lock_mount_hash();
-			unlock_mount_hash();
+			read_seqlock_excl(&mount_lock);
+			read_sequnlock_excl(&mount_lock);
 			preempt_disable();
 		}
 	}
@@ -671,7 +671,7 @@ EXPORT_SYMBOL(mnt_drop_write_file);
  * a call to mnt_unhold_writers() in order to stop preventing write access to
  * @mnt.
  *
- * Context: This function expects lock_mount_hash() to be held serializing
+ * Context: This function expects to be in mount_locked_reader scope serializing
  *          setting WRITE_HOLD.
  * Return: On success 0 is returned.
  *	   On error, -EBUSY is returned.
@@ -716,7 +716,8 @@ static inline int mnt_hold_writers(struct mount *mnt)
  *
  * This function can only be called after a call to mnt_hold_writers().
  *
- * Context: This function expects lock_mount_hash() to be held.
+ * Context: This function expects to be in the same mount_locked_reader scope
+ * as the matching mnt_hold_writers().
  */
 static inline void mnt_unhold_writers(struct mount *mnt)
 {
@@ -770,7 +771,8 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 	if (atomic_long_read(&sb->s_remove_count))
 		return -EBUSY;
 
-	lock_mount_hash();
+	guard(mount_locked_reader)();
+
 	for (struct mount *m = sb->s_mounts; m; m = m->mnt_next_for_sb) {
 		if (!(m->mnt.mnt_flags & MNT_READONLY)) {
 			err = mnt_hold_writers(m);
@@ -787,7 +789,6 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 		if (test_write_hold(m))
 			clear_write_hold(m);
 	}
-	unlock_mount_hash();
 
 	return err;
 }
@@ -1226,9 +1227,8 @@ static void setup_mnt(struct mount *m, struct dentry *root)
 	m->mnt_mountpoint = m->mnt.mnt_root;
 	m->mnt_parent = m;
 
-	lock_mount_hash();
+	guard(mount_locked_reader)();
 	mnt_add_instance(m, s);
-	unlock_mount_hash();
 }
 
 /**
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* [PATCH v3 65/65] constify {__,}mnt_is_readonly()
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
                         ` (73 preceding siblings ...)
  2025-09-03  4:55       ` [PATCH v3 64/65] " Al Viro
@ 2025-09-03  4:55       ` Al Viro
  74 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  4:55 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: brauner, jack, torvalds

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c        | 4 ++--
 include/linux/mount.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 9eef4ca6d36a..c88fe350b550 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -428,7 +428,7 @@ static struct mount *alloc_vfsmnt(const char *name)
  * mnt_want/drop_write() will _keep_ the filesystem
  * r/w.
  */
-bool __mnt_is_readonly(struct vfsmount *mnt)
+bool __mnt_is_readonly(const struct vfsmount *mnt)
 {
 	return (mnt->mnt_flags & MNT_READONLY) || sb_rdonly(mnt->mnt_sb);
 }
@@ -468,7 +468,7 @@ static unsigned int mnt_get_writers(struct mount *mnt)
 #endif
 }
 
-static int mnt_is_readonly(struct vfsmount *mnt)
+static int mnt_is_readonly(const struct vfsmount *mnt)
 {
 	if (READ_ONCE(mnt->mnt_sb->s_readonly_remount))
 		return 1;
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 85e97b9340ff..acfe7ef86a1b 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -76,7 +76,7 @@ extern void mntput(struct vfsmount *mnt);
 extern struct vfsmount *mntget(struct vfsmount *mnt);
 extern void mnt_make_shortterm(struct vfsmount *mnt);
 extern struct vfsmount *mnt_clone_internal(const struct path *path);
-extern bool __mnt_is_readonly(struct vfsmount *mnt);
+extern bool __mnt_is_readonly(const struct vfsmount *mnt);
 extern bool mnt_may_suid(struct vfsmount *mnt);
 
 extern struct vfsmount *clone_private_mount(const struct path *path);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 320+ messages in thread

* Re: [PATCHES v3][RFC][CFT] mount-related stuff
  2025-09-03  4:54   ` [PATCHES v3][RFC][CFT] mount-related stuff Al Viro
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
@ 2025-09-03  5:08     ` Al Viro
  2025-09-03 14:47     ` Linus Torvalds
  2 siblings, 0 replies; 320+ messages in thread
From: Al Viro @ 2025-09-03  5:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Linus Torvalds, Christian Brauner, Jan Kara

On Wed, Sep 03, 2025 at 05:54:32AM +0100, Al Viro wrote:
> Branch force-pushed into
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.mount
> (also visible as #v3.mount, #v[12].mount being the previous versions)
> Individual patches in followups.
> 
> If nobody objects, this goes into #for-next.

PS: survives LTP, xfstests and mount-related selftests.

FWIW, I've spent the weekend trying to figure out what's going on with
generic/475.  Turns out that it was not a regression - it goes back at
least to 6.12 and it's triggered by PREEMPT vs. PREEMPT_VOLUNTARY in
config.

The former gives several kinds of failures, with total frequency about 8%;
the latter apparently works - if any similar failures happen, the frequency
is at least an order of magnitude lower.

One useful thing I've got out of that is a bunch of helpers for doing
bisect for configs - semi-manual decomposing the difference between two
configs into a series of small changes, allowing to do bisection on that.

Unfortunately, the change it has converged to (and repeating it alone on
the original config reproduces the effect) is not particulary useful -
some race gets triggered by a config change that affects timings all over
the place ;-/

^ permalink raw reply	[flat|nested] 320+ messages in thread

* Re: [PATCHES v3][RFC][CFT] mount-related stuff
  2025-09-03  4:54   ` [PATCHES v3][RFC][CFT] mount-related stuff Al Viro
  2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
  2025-09-03  5:08     ` [PATCHES v3][RFC][CFT] mount-related stuff Al Viro
@ 2025-09-03 14:47     ` Linus Torvalds
  2 siblings, 0 replies; 320+ messages in thread
From: Linus Torvalds @ 2025-09-03 14:47 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christian Brauner, Jan Kara

On Tue, 2 Sept 2025 at 21:54, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> If nobody objects, this goes into #for-next.

Looks all sane to me.

What was the issue with generic/475? I have missed that context..

           Linus

^ permalink raw reply	[flat|nested] 320+ messages in thread

end of thread, other threads:[~2025-09-03 14:47 UTC | newest]

Thread overview: 320+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-25  4:40 [PATCHED][RFC][CFT] mount-related stuff Al Viro
2025-08-25  4:43 ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Al Viro
2025-08-25  4:43   ` [PATCH 02/52] introduced guards for mount_lock Al Viro
2025-08-25 12:32     ` Christian Brauner
2025-08-25 13:46       ` Al Viro
2025-08-25 20:21         ` Al Viro
2025-08-25 23:44           ` Al Viro
2025-08-26  1:44             ` Al Viro
2025-08-26 15:17           ` Askar Safin
2025-08-26 15:45             ` Al Viro
2025-08-25  4:43   ` [PATCH 03/52] fs/namespace.c: allow to drop vfsmount references via __free(mntput) Al Viro
2025-08-25 12:33     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 04/52] __detach_mounts(): use guards Al Viro
2025-08-25 12:33     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 05/52] __is_local_mountpoint(): " Al Viro
2025-08-25 12:33     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 06/52] do_change_type(): " Al Viro
2025-08-25 12:34     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 07/52] do_set_group(): " Al Viro
2025-08-25 12:35     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 08/52] mark_mounts_for_expiry(): " Al Viro
2025-08-25 12:37     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 09/52] put_mnt_ns(): " Al Viro
2025-08-25 12:37     ` Christian Brauner
2025-08-25 12:40     ` Christian Brauner
2025-08-25 16:21       ` Al Viro
2025-08-25  4:43   ` [PATCH 10/52] mnt_already_visible(): " Al Viro
2025-08-25 12:39     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 11/52] check_for_nsfs_mounts(): no need to take locks Al Viro
2025-08-25 12:48     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 12/52] propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint() Al Viro
2025-08-25 12:49     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 13/52] has_locked_children(): use guards Al Viro
2025-08-25 11:54     ` Linus Torvalds
2025-08-25 17:33       ` Al Viro
2025-08-25 12:49     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 14/52] mnt_set_expiry(): " Al Viro
2025-08-25 12:51     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 15/52] path_is_under(): " Al Viro
2025-08-25 12:56     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 16/52] current_chrooted(): don't bother with follow_down_one() Al Viro
2025-08-25 12:57     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 17/52] current_chrooted(): use guards Al Viro
2025-08-25 12:57     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 18/52] do_move_mount(): trim local variables Al Viro
2025-08-25 12:57     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 19/52] do_move_mount(): deal with the checks on old_path early Al Viro
2025-08-25 13:00     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 20/52] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() Al Viro
2025-08-25 12:10     ` Linus Torvalds
2025-08-25 12:17       ` Linus Torvalds
2025-08-25 13:02     ` Christian Brauner
2025-08-25 16:05       ` Al Viro
2025-08-25  4:43   ` [PATCH 21/52] finish_automount(): simplify the ELOOP check Al Viro
2025-08-25 13:02     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 22/52] do_loopback(): use __free(path_put) to deal with old_path Al Viro
2025-08-25 13:02     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 23/52] pivot_root(2): use __free() to deal with struct path in it Al Viro
2025-08-25 13:03     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 24/52] finish_automount(): take the lock_mount() analogue into a helper Al Viro
2025-08-25 13:08     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 25/52] do_new_mount_rc(): use __free() to deal with dropping mnt on failure Al Viro
2025-08-25 13:29     ` Christian Brauner
2025-08-25 16:09       ` Al Viro
2025-08-26  8:27         ` Christian Brauner
2025-08-26 17:00           ` Al Viro
2025-08-26 17:55             ` Al Viro
2025-08-26 18:21               ` [RFC][PATCH] switch do_new_mount_fc() to using fc_mount() Al Viro
2025-08-27 15:38                 ` Paul Moore
2025-08-25  4:43   ` [PATCH 26/52] finish_automount(): use __free() to deal with dropping mnt on failure Al Viro
2025-08-25 13:09     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 27/52] change calling conventions for lock_mount() et.al Al Viro
2025-08-25  4:43   ` [PATCH 28/52] do_move_mount(): use the parent mount returned by do_lock_mount() Al Viro
2025-08-25  4:43   ` [PATCH 29/52] do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path Al Viro
2025-08-25  4:43   ` [PATCH 30/52] graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint Al Viro
2025-08-25  4:43   ` [PATCH 31/52] pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry Al Viro
2025-08-25 13:43     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 32/52] don't bother passing new_path->dentry to can_move_mount_beneath() Al Viro
2025-08-25  4:43   ` [PATCH 33/52] new helper: topmost_overmount() Al Viro
2025-08-25 13:43     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 34/52] do_lock_mount(): don't modify path Al Viro
2025-08-26 14:14     ` Askar Safin
2025-08-25  4:43   ` [PATCH 35/52] constify check_mnt() Al Viro
2025-08-25 13:43     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 36/52] do_mount_setattr(): constify path argument Al Viro
2025-08-25 13:30     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 37/52] do_set_group(): constify path arguments Al Viro
2025-08-25 13:29     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 38/52] drop_collected_paths(): constify arguments Al Viro
2025-08-25 13:31     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 39/52] collect_paths(): constify the return value Al Viro
2025-08-25 13:30     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 40/52] do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s) Al Viro
2025-08-25 13:30     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 41/52] mnt_warn_timestamp_expiry(): constify struct path argument Al Viro
2025-08-25 13:32     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 42/52] do_new_mount{,_fc}(): " Al Viro
2025-08-25 13:30     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 43/52] do_{loopback,change_type,remount,reconfigure_mnt}(): " Al Viro
2025-08-25 13:31     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 44/52] path_mount(): " Al Viro
2025-08-25 13:32     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 45/52] may_copy_tree(), __do_loopback(): " Al Viro
2025-08-25 13:40     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 46/52] path_umount(): " Al Viro
2025-08-25 13:40     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 47/52] constify can_move_mount_beneath() arguments Al Viro
2025-08-25 13:39     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 48/52] do_move_mount_old(): use __free(path_put) Al Viro
2025-08-25 13:40     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 49/52] do_mount(): " Al Viro
2025-08-25 13:32     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 50/52] umount_tree(): take all victims out of propagation graph at once Al Viro
2025-08-25  4:43   ` [PATCH 51/52] ecryptfs: get rid of pointless mount references in ecryptfs dentries Al Viro
2025-08-25 13:41     ` Christian Brauner
2025-08-25  4:43   ` [PATCH 52/52] fs/namespace.c: sanitize descriptions for {__,}lookup_mnt() Al Viro
2025-08-25 13:42     ` Christian Brauner
2025-08-25 12:30   ` [PATCH 01/52] fs/namespace.c: fix the namespace_sem guard mess Christian Brauner
2025-08-25 12:26 ` [PATCHED][RFC][CFT] mount-related stuff Christian Brauner
2025-08-25 12:43 ` Christian Brauner
2025-08-25 16:11   ` Al Viro
2025-08-25 17:43     ` Al Viro
2025-08-25 20:18       ` Theodore Ts'o
2025-08-26  8:56       ` Christian Brauner
2025-08-27 17:19         ` Linus Torvalds
2025-08-27 17:49           ` Linus Torvalds
2025-08-27 22:49             ` Konstantin Ryabitsev
2025-08-27 23:40               ` Linus Torvalds
2025-08-28  0:41                 ` Konstantin Ryabitsev
2025-08-28  1:00                   ` Al Viro
2025-08-28  1:15                     ` Konstantin Ryabitsev
2025-08-28  1:29                   ` Linus Torvalds
2025-08-29 12:30                     ` Theodore Ts'o
2025-08-29 18:25                       ` Konstantin Ryabitsev
2025-08-28 23:07 ` [PATCHES v2][RFC][CFT] " Al Viro
2025-08-28 23:07   ` [PATCH v2 01/63] fs/namespace.c: fix the namespace_sem guard mess Al Viro
2025-08-28 23:07     ` [PATCH v2 02/63] introduced guards for mount_lock Al Viro
2025-08-29  9:49       ` Christian Brauner
2025-08-28 23:07     ` [PATCH v2 03/63] fs/namespace.c: allow to drop vfsmount references via __free(mntput) Al Viro
2025-08-28 23:07     ` [PATCH v2 04/63] __detach_mounts(): use guards Al Viro
2025-08-29  9:48       ` Christian Brauner
2025-08-28 23:07     ` [PATCH v2 05/63] __is_local_mountpoint(): " Al Viro
2025-08-28 23:07     ` [PATCH v2 06/63] do_change_type(): " Al Viro
2025-08-28 23:07     ` [PATCH v2 07/63] do_set_group(): " Al Viro
2025-08-28 23:07     ` [PATCH v2 08/63] mark_mounts_for_expiry(): " Al Viro
2025-08-28 23:07     ` [PATCH v2 09/63] put_mnt_ns(): " Al Viro
2025-08-28 23:07     ` [PATCH v2 10/63] mnt_already_visible(): " Al Viro
2025-08-28 23:07     ` [PATCH v2 11/63] check_for_nsfs_mounts(): no need to take locks Al Viro
2025-08-28 23:07     ` [PATCH v2 12/63] propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint() Al Viro
2025-08-28 23:07     ` [PATCH v2 13/63] has_locked_children(): use guards Al Viro
2025-08-29  9:49       ` Christian Brauner
2025-08-28 23:07     ` [PATCH v2 14/63] mnt_set_expiry(): " Al Viro
2025-08-29  9:49       ` Christian Brauner
2025-08-28 23:07     ` [PATCH v2 15/63] path_is_under(): " Al Viro
2025-08-28 23:07     ` [PATCH v2 16/63] current_chrooted(): don't bother with follow_down_one() Al Viro
2025-08-28 23:07     ` [PATCH v2 17/63] current_chrooted(): use guards Al Viro
2025-08-28 23:07     ` [PATCH v2 18/63] switch do_new_mount_fc() to fc_mount() Al Viro
2025-08-29  9:53       ` Christian Brauner
2025-08-28 23:07     ` [PATCH v2 19/63] do_move_mount(): trim local variables Al Viro
2025-08-28 23:07     ` [PATCH v2 20/63] do_move_mount(): deal with the checks on old_path early Al Viro
2025-08-28 23:07     ` [PATCH v2 21/63] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() Al Viro
2025-08-28 23:07     ` [PATCH v2 22/63] finish_automount(): simplify the ELOOP check Al Viro
2025-08-28 23:07     ` [PATCH v2 23/63] do_loopback(): use __free(path_put) to deal with old_path Al Viro
2025-08-28 23:07     ` [PATCH v2 24/63] pivot_root(2): use __free() to deal with struct path in it Al Viro
2025-08-28 23:07     ` [PATCH v2 25/63] finish_automount(): take the lock_mount() analogue into a helper Al Viro
2025-08-28 23:07     ` [PATCH v2 26/63] do_new_mount_rc(): use __free() to deal with dropping mnt on failure Al Viro
2025-09-01 11:34       ` Christian Brauner
2025-08-28 23:07     ` [PATCH v2 27/63] finish_automount(): " Al Viro
2025-08-28 23:07     ` [PATCH v2 28/63] change calling conventions for lock_mount() et.al Al Viro
2025-09-01 11:37       ` Christian Brauner
2025-08-28 23:07     ` [PATCH v2 29/63] do_move_mount(): use the parent mount returned by do_lock_mount() Al Viro
2025-09-01 11:38       ` Christian Brauner
2025-08-28 23:07     ` [PATCH v2 30/63] do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path Al Viro
2025-09-01 11:40       ` Christian Brauner
2025-08-28 23:07     ` [PATCH v2 31/63] graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint Al Viro
2025-09-01 11:41       ` Christian Brauner
2025-08-28 23:07     ` [PATCH v2 32/63] pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry Al Viro
2025-08-28 23:07     ` [PATCH v2 33/63] don't bother passing new_path->dentry to can_move_mount_beneath() Al Viro
2025-08-28 23:20       ` Linus Torvalds
2025-08-28 23:39         ` Al Viro
2025-08-28 23:07     ` [PATCH v2 34/63] new helper: topmost_overmount() Al Viro
2025-08-28 23:07     ` [PATCH v2 35/63] do_lock_mount(): don't modify path Al Viro
2025-09-02 10:55       ` Christian Brauner
2025-08-28 23:07     ` [PATCH v2 36/63] constify check_mnt() Al Viro
2025-08-28 23:07     ` [PATCH v2 37/63] do_mount_setattr(): constify path argument Al Viro
2025-08-28 23:07     ` [PATCH v2 38/63] do_set_group(): constify path arguments Al Viro
2025-08-28 23:07     ` [PATCH v2 39/63] drop_collected_paths(): constify arguments Al Viro
2025-08-28 23:07     ` [PATCH v2 40/63] collect_paths(): constify the return value Al Viro
2025-08-28 23:07     ` [PATCH v2 41/63] do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s) Al Viro
2025-08-28 23:07     ` [PATCH v2 42/63] mnt_warn_timestamp_expiry(): constify struct path argument Al Viro
2025-08-28 23:07     ` [PATCH v2 43/63] do_new_mount{,_fc}(): " Al Viro
2025-08-28 23:07     ` [PATCH v2 44/63] do_{loopback,change_type,remount,reconfigure_mnt}(): " Al Viro
2025-08-28 23:07     ` [PATCH v2 45/63] path_mount(): " Al Viro
2025-08-28 23:07     ` [PATCH v2 46/63] may_copy_tree(), __do_loopback(): " Al Viro
2025-08-28 23:07     ` [PATCH v2 47/63] path_umount(): " Al Viro
2025-08-28 23:07     ` [PATCH v2 48/63] constify can_move_mount_beneath() arguments Al Viro
2025-08-28 23:07     ` [PATCH v2 49/63] do_move_mount_old(): use __free(path_put) Al Viro
2025-08-28 23:07     ` [PATCH v2 50/63] do_mount(): " Al Viro
2025-08-28 23:07     ` [PATCH v2 51/63] umount_tree(): take all victims out of propagation graph at once Al Viro
2025-09-01 11:50       ` Christian Brauner
2025-08-28 23:07     ` [PATCH v2 52/63] ecryptfs: get rid of pointless mount references in ecryptfs dentries Al Viro
2025-08-28 23:07     ` [PATCH v2 53/63] fs/namespace.c: sanitize descriptions for {__,}lookup_mnt() Al Viro
2025-08-28 23:07     ` [PATCH v2 54/63] open_detached_copy(): don't bother with mount_lock_hash() Al Viro
2025-09-01 11:29       ` Christian Brauner
2025-08-28 23:07     ` [PATCH v2 55/63] open_detached_copy(): separate creation of namespace into helper Al Viro
2025-08-29  9:54       ` Christian Brauner
2025-08-28 23:07     ` [PATCH v2 56/63] mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list Al Viro
2025-08-29  9:57       ` Christian Brauner
2025-08-28 23:08     ` [PATCH v2 57/63] copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure Al Viro
2025-08-29  9:56       ` Christian Brauner
2025-08-28 23:08     ` [PATCH v2 58/63] copy_mnt_ns(): use guards Al Viro
2025-09-01 11:43       ` Christian Brauner
2025-08-28 23:08     ` [PATCH v2 59/63] setup_mnt(): primitive for connecting a mount to filesystem Al Viro
2025-08-28 23:08     ` [PATCH v2 60/63] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags Al Viro
2025-08-28 23:08     ` [PATCH v2 61/63] struct mount: relocate MNT_WRITE_HOLD bit Al Viro
2025-08-28 23:31       ` Linus Torvalds
2025-08-29  0:11         ` Al Viro
2025-08-29  0:35           ` Linus Torvalds
2025-08-29  6:03             ` Al Viro
2025-08-29  6:04               ` [59/63] simplify the callers of mnt_unhold_writers() Al Viro
2025-09-01 11:20                 ` Christian Brauner
2025-08-29  6:05               ` [60/63] setup_mnt(): primitive for connecting a mount to filesystem Al Viro
2025-08-29  9:59                 ` Christian Brauner
2025-08-29 16:37                   ` Al Viro
2025-08-30  4:36                     ` Al Viro
2025-08-30  7:33                       ` [RFC] does # really need to be escaped in devnames? Al Viro
2025-08-30 19:40                         ` Linus Torvalds
2025-08-30 20:42                           ` Al Viro
2025-09-02 15:03                           ` Siddhesh Poyarekar
2025-09-02 16:30                             ` Linus Torvalds
2025-09-02 16:39                               ` Siddhesh Poyarekar
2025-09-02 17:48                             ` David Howells
2025-09-02 20:04                               ` Linus Torvalds
2025-09-01 11:17                 ` [60/63] setup_mnt(): primitive for connecting a mount to filesystem Christian Brauner
2025-08-29  6:06               ` [61/63] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags Al Viro
2025-09-01 11:27                 ` Christian Brauner
2025-08-29  6:07               ` [62/63] struct mount: relocate MNT_WRITE_HOLD bit Al Viro
2025-09-01 11:26                 ` Christian Brauner
2025-08-28 23:08     ` [PATCH v2 62/63] simplify the callers of mnt_unhold_writers() Al Viro
2025-08-28 23:08     ` [PATCH v2 63/63] WRITE_HOLD machinery: no need for to bump mount_lock seqcount Al Viro
2025-09-01 11:28       ` Christian Brauner
2025-09-03  4:54   ` [PATCHES v3][RFC][CFT] mount-related stuff Al Viro
2025-09-03  4:54     ` [PATCH v3 01/65] fs/namespace.c: fix the namespace_sem guard mess Al Viro
2025-09-03  4:54       ` [PATCH v3 02/65] introduced guards for mount_lock Al Viro
2025-09-03  4:54       ` [PATCH v3 03/65] fs/namespace.c: allow to drop vfsmount references via __free(mntput) Al Viro
2025-09-03  4:54       ` [PATCH v3 04/65] __detach_mounts(): use guards Al Viro
2025-09-03  4:54       ` [PATCH v3 05/65] __is_local_mountpoint(): " Al Viro
2025-09-03  4:54       ` [PATCH v3 06/65] do_change_type(): " Al Viro
2025-09-03  4:54       ` [PATCH v3 07/65] do_set_group(): " Al Viro
2025-09-03  4:54       ` [PATCH v3 08/65] mark_mounts_for_expiry(): " Al Viro
2025-09-03  4:54       ` [PATCH v3 09/65] put_mnt_ns(): " Al Viro
2025-09-03  4:54       ` [PATCH v3 10/65] mnt_already_visible(): " Al Viro
2025-09-03  4:54       ` [PATCH v3 11/65] check_for_nsfs_mounts(): no need to take locks Al Viro
2025-09-03  4:54       ` [PATCH v3 12/65] propagate_mnt(): use scoped_guard(mount_locked_reader) for mnt_set_mountpoint() Al Viro
2025-09-03  4:54       ` [PATCH v3 13/65] has_locked_children(): use guards Al Viro
2025-09-03  4:54       ` [PATCH v3 14/65] mnt_set_expiry(): " Al Viro
2025-09-03  4:54       ` [PATCH v3 15/65] path_is_under(): " Al Viro
2025-09-03  4:54       ` [PATCH v3 16/65] current_chrooted(): don't bother with follow_down_one() Al Viro
2025-09-03  4:54       ` [PATCH v3 17/65] current_chrooted(): use guards Al Viro
2025-09-03  4:54       ` [PATCH v3 18/65] switch do_new_mount_fc() to fc_mount() Al Viro
2025-09-03  4:54       ` [PATCH v3 19/65] do_move_mount(): trim local variables Al Viro
2025-09-03  4:54       ` [PATCH v3 20/65] do_move_mount(): deal with the checks on old_path early Al Viro
2025-09-03  4:54       ` [PATCH v3 21/65] move_mount(2): take sanity checks in 'beneath' case into do_lock_mount() Al Viro
2025-09-03  4:54       ` [PATCH v3 22/65] finish_automount(): simplify the ELOOP check Al Viro
2025-09-03  4:54       ` [PATCH v3 23/65] do_loopback(): use __free(path_put) to deal with old_path Al Viro
2025-09-03  4:54       ` [PATCH v3 24/65] pivot_root(2): use __free() to deal with struct path in it Al Viro
2025-09-03  4:54       ` [PATCH v3 25/65] finish_automount(): take the lock_mount() analogue into a helper Al Viro
2025-09-03  4:54       ` [PATCH v3 26/65] do_new_mount_fc(): use __free() to deal with dropping mnt on failure Al Viro
2025-09-03  4:54       ` [PATCH v3 26/63] do_new_mount_rc(): " Al Viro
2025-09-03  4:54       ` [PATCH v3 27/65] finish_automount(): " Al Viro
2025-09-03  4:54       ` [PATCH v3 28/65] change calling conventions for lock_mount() et.al Al Viro
2025-09-03  4:54       ` [PATCH v3 29/65] do_move_mount(): use the parent mount returned by do_lock_mount() Al Viro
2025-09-03  4:54       ` [PATCH v3 30/65] do_add_mount(): switch to passing pinned_mountpoint instead of mountpoint + path Al Viro
2025-09-03  4:54       ` [PATCH v3 31/65] graft_tree(), attach_recursive_mnt() - pass pinned_mountpoint Al Viro
2025-09-03  4:54       ` [PATCH v3 32/65] pivot_root(2): use old_mp.mp->m_dentry instead of old.dentry Al Viro
2025-09-03  4:54       ` [PATCH v3 33/65] don't bother passing new_path->dentry to can_move_mount_beneath() Al Viro
2025-09-03  4:54       ` [PATCH v3 34/65] new helper: topmost_overmount() Al Viro
2025-09-03  4:54       ` [PATCH v3 35/65] do_lock_mount(): don't modify path Al Viro
2025-09-03  4:54       ` [PATCH v3 36/65] constify check_mnt() Al Viro
2025-09-03  4:54       ` [PATCH v3 37/65] do_mount_setattr(): constify path argument Al Viro
2025-09-03  4:55       ` [PATCH v3 38/65] do_set_group(): constify path arguments Al Viro
2025-09-03  4:55       ` [PATCH v3 39/65] drop_collected_paths(): constify arguments Al Viro
2025-09-03  4:55       ` [PATCH v3 40/65] collect_paths(): constify the return value Al Viro
2025-09-03  4:55       ` [PATCH v3 41/65] do_move_mount(), vfs_move_mount(), do_move_mount_old(): constify struct path argument(s) Al Viro
2025-09-03  4:55       ` [PATCH v3 42/65] mnt_warn_timestamp_expiry(): constify struct path argument Al Viro
2025-09-03  4:55       ` [PATCH v3 43/65] do_new_mount{,_fc}(): " Al Viro
2025-09-03  4:55       ` [PATCH v3 44/65] do_{loopback,change_type,remount,reconfigure_mnt}(): " Al Viro
2025-09-03  4:55       ` [PATCH v3 45/65] path_mount(): " Al Viro
2025-09-03  4:55       ` [PATCH v3 46/65] may_copy_tree(), __do_loopback(): " Al Viro
2025-09-03  4:55       ` [PATCH v3 47/65] path_umount(): " Al Viro
2025-09-03  4:55       ` [PATCH v3 48/65] constify can_move_mount_beneath() arguments Al Viro
2025-09-03  4:55       ` [PATCH v3 49/65] do_move_mount_old(): use __free(path_put) Al Viro
2025-09-03  4:55       ` [PATCH v3 50/65] do_mount(): " Al Viro
2025-09-03  4:55       ` [PATCH v3 51/65] umount_tree(): take all victims out of propagation graph at once Al Viro
2025-09-03  4:55       ` [PATCH v3 52/65] ecryptfs: get rid of pointless mount references in ecryptfs dentries Al Viro
2025-09-03  4:55       ` [PATCH v3 53/65] fs/namespace.c: sanitize descriptions for {__,}lookup_mnt() Al Viro
2025-09-03  4:55       ` [PATCH v3 54/63] open_detached_copy(): don't bother with mount_lock_hash() Al Viro
2025-09-03  4:55       ` [PATCH v3 54/65] path_has_submounts(): use guard(mount_locked_reader) Al Viro
2025-09-03  4:55       ` [PATCH v3 55/65] open_detached_copy(): don't bother with mount_lock_hash() Al Viro
2025-09-03  4:55       ` [PATCH v3 55/63] open_detached_copy(): separate creation of namespace into helper Al Viro
2025-09-03  4:55       ` [PATCH v3 56/63] mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list Al Viro
2025-09-03  4:55       ` [PATCH v3 56/65] open_detached_copy(): separate creation of namespace into helper Al Viro
2025-09-03  4:55       ` [PATCH v3 57/63] copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure Al Viro
2025-09-03  4:55       ` [PATCH v3 57/65] mnt_ns_tree_remove(): DTRT if mnt_ns had never been added to mnt_ns_list Al Viro
2025-09-03  4:55       ` [PATCH v3 58/63] copy_mnt_ns(): use guards Al Viro
2025-09-03  4:55       ` [PATCH v3 58/65] copy_mnt_ns(): use the regular mechanism for freeing empty mnt_ns on failure Al Viro
2025-09-03  4:55       ` [PATCH v3 59/65] copy_mnt_ns(): use guards Al Viro
2025-09-03  4:55       ` [PATCH v3 59/63] simplify the callers of mnt_unhold_writers() Al Viro
2025-09-03  4:55       ` [PATCH v3 60/63] setup_mnt(): primitive for connecting a mount to filesystem Al Viro
2025-09-03  4:55       ` [PATCH v3 60/65] simplify the callers of mnt_unhold_writers() Al Viro
2025-09-03  4:55       ` [PATCH v3 61/63] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags Al Viro
2025-09-03  4:55       ` [PATCH v3 61/65] setup_mnt(): primitive for connecting a mount to filesystem Al Viro
2025-09-03  4:55       ` [PATCH v3 62/65] preparations to taking MNT_WRITE_HOLD out of ->mnt_flags Al Viro
2025-09-03  4:55       ` [PATCH v3 62/63] struct mount: relocate MNT_WRITE_HOLD bit Al Viro
2025-09-03  4:55       ` [PATCH v3 63/65] " Al Viro
2025-09-03  4:55       ` [PATCH v3 63/63] WRITE_HOLD machinery: no need for to bump mount_lock seqcount Al Viro
2025-09-03  4:55       ` [PATCH v3 64/65] " Al Viro
2025-09-03  4:55       ` [PATCH v3 65/65] constify {__,}mnt_is_readonly() Al Viro
2025-09-03  5:08     ` [PATCHES v3][RFC][CFT] mount-related stuff Al Viro
2025-09-03 14:47     ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).