Re: [RFC] MNT_LOCKED vs. finish_automount()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Al Viro <viro@zeniv.linux.org.uk>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: linux-fsdevel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Christian Brauner <brauner@kernel.org>
Subject: Re: [RFC] MNT_LOCKED vs. finish_automount()
Date: Tue, 6 May 2025 03:25:11 +0100	[thread overview]
Message-ID: <20250506022511.GL2023217@ZenIV> (raw)
In-Reply-To: <20250505190215.GG2023217@ZenIV>

On Mon, May 05, 2025 at 08:02:15PM +0100, Al Viro wrote:

> Said that, from digging I'd done last night there might be a different
> long-term approach; the thing is, there are very few places where we
> ever look at MNT_LOCKED for mounts with !mnt_has_parent():
> 	do_umount()
> 	can_umount()
> 	do_move_mount()
> 	pivot_root()
> and two of those four are of dubious use - can_umount() will be followed
> by rechecking in do_umount(), so there's not much point in bailing out
> early in a very pathological case.  And do_move_mount() might check
> MNT_LOCKED on such, but only in move-from-anon case, where we do *NOT*
> set MNT_LOCKED on namespace root.

Even better, we have a check for !mnt_has_parent() downstream of that in
pivot_root() and we reject such there.  So having that set on absolute
root of non-anon namespaces buys us very little - do_umount() is the only
place that would need to explicitly check for that.

> IOW, looking at the way things are now, I'm no longer sure that the trick
> you've done in da362b09e42 "umount: Do not allow unmounting rootfs"
> had been a good idea long-term - it certainly made for smaller fix,
> but it overloaded semantics of MNT_LOCKED, making it not just about
> protecting the mountpoint against exposure.
> 
> What if we add explicit checks for mnt_has_parent() in do_umount() and
> pivot_root() next to existing checks for MNT_LOCKED?  Then we can
> 	* have clone_mnt() treat that flag as "internal, ignore" (matter
> of fact, I would simply add MNT_LOCKED to MNT_INTERNAL_FLAGS and have
> clone_mnt() strip that, same as we do in do_add_mount()).
> 	* have copy_tree() set it right next to attach_mnt(), if the source
> had it.  Yes, leaving it clear for root of copied tree.
> 	* no need to clear it in the end of __do_loopback() (racy at the
> moment, BTW - no mount_lock held there, so race with mount -o remount,ro
> for the original is possible, so something is needed there)
> 	* have lock_mnt_tree() skip the MNT_LOCKED not just for the expirable
> but for the root of subtree as well.
> 	* don't bother stripping MNT_LOCKED for roots of propagation copies
> (or anyone, for that matter) in attach_recursive_mnt()
> 	* don't bother setting it on absolute root of initial namespace
> 
> Looks like we end up with overall simpler code that way; it's certainly
> conceptually simpler - MNT_LOCKED is set only on the mounts that do
> have parent we care to protect, with that being done atomically wrt
> mount becoming reachable (lock_mnt_tree() is in the same lock_mount_hash()
> scope where we call commit_tree() that makes the entire subtree reachable).
> 
> Your argument in "mnt: Move the clear of MNT_LOCKED from copy_tree to
> it's callers" about wanting to keep it in collect_mounts() for audit
> purposes is wrong, AFAICS - audit does not see or care about MNT_LOCKED;
> the only thing we use the result of collect_mounts() for is passing
> it to iterate_mounts(), which literally "apply this callback to each
> vfsmount in the set", completely ignoring anything else.  What's more,
> all callbacks audit is passing to it actually look only at the inode of
> that mount's root...
> 
> Anyway, that's longer-term stuff; I'll put together a patch along those
> lines on top of this one.

Here it is, on top of the previous.  Objections, comments?

[PATCH] don't set MNT_LOCKED on parentless mounts

Originally MNT_LOCKED meant only one thing - "don't let this mount to
be peeled off its parent, we don't want to have mountpoint exposed".
Accordingly, it had only been set on mounts that *do* have a parent.
Later it had been overloaded with another use - setting it on the
absolute root had given free protection against umount(2) on absolute
root (possible to trigger, oopsed).  Not a bad trick, but it ended
up costing more than it bought us.  Unfortunately, the cost included
both hard-to-reason-about logics and a subtle race between
mount -o remount,ro and mount --[r]bind - lockless &= ~MNT_LOCKED in
the end of __do_loopback() could race with sb_prepare_remount_readonly()
setting and clearing MNT_HOLD_WRITE (under mount_lock, as it should
be).

Turns out that nobody except umount(2) had ever made use of having
MNT_LOCKED set on absolute root.  So let's give up on that trick,
clever as it had been, add an explicit check in do_umount() and
return to using MNT_LOCKED only for mounts that have a parent.

It means that
	* clone_mnt() no longer copies MNT_LOCKED
	* copy_tree() sets it on submounts if their counterparts had
been marked such, and do that right next to attach_mnt() in there,
in the same mount_lock scope.
	* __do_loopback() no longer needs to strip MNT_LOCKED off the
root of subtree it's about to return; no store, no race.
	* init_mount_tree() doesn't bother setting MNT_LOCKED on absolute
root.
	* lock_mnt_tree() does not set MNT_LOCKED on the subtree's root;
accordingly, its caller (loop in attach_recursive_mnt()) does not need to
bother stripping that MNT_LOCKED.  Note that lock_mnt_tree() setting
MNT_LOCKED on submounts happens in the same mount_lock scope as __attach_mnt()
(from commit_tree()) that makes them reachable.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/namespace.c b/fs/namespace.c
index 04a9bb9f31fa..165b3bd26857 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1363,7 +1363,7 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 	}
 
 	mnt->mnt.mnt_flags = old->mnt.mnt_flags;
-	mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL);
+	mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL|MNT_LOCKED);
 
 	atomic_inc(&sb->s_active);
 	mnt->mnt.mnt_idmap = mnt_idmap_get(mnt_idmap(&old->mnt));
@@ -2038,6 +2038,9 @@ static int do_umount(struct mount *mnt, int flags)
 	if (mnt->mnt.mnt_flags & MNT_LOCKED)
 		goto out;
 
+	if (!mnt_has_parent(mnt))
+		goto out;
+
 	event++;
 	if (flags & MNT_DETACH) {
 		if (mnt_ns_attached(mnt) || !list_empty(&mnt->mnt_list))
@@ -2308,6 +2311,8 @@ struct mount *copy_tree(struct mount *src_root, struct dentry *dentry,
 			if (IS_ERR(dst_mnt))
 				goto out;
 			lock_mount_hash();
+			if (src_mnt->mnt.mnt_flags & MNT_LOCKED)
+				dst_mnt->mnt.mnt_flags |= MNT_LOCKED;
 			list_add_tail(&dst_mnt->mnt_list, &res->mnt_list);
 			attach_mnt(dst_mnt, dst_parent, src_parent->mnt_mp, false);
 			unlock_mount_hash();
@@ -2547,7 +2552,7 @@ static void lock_mnt_tree(struct mount *mnt)
 		if (flags & MNT_NOEXEC)
 			flags |= MNT_LOCK_NOEXEC;
 		/* Don't allow unprivileged users to reveal what is under a mount */
-		if (list_empty(&p->mnt_expire))
+		if (list_empty(&p->mnt_expire) && p != mnt)
 			flags |= MNT_LOCKED;
 		p->mnt.mnt_flags = flags;
 	}
@@ -2758,7 +2763,6 @@ static int attach_recursive_mnt(struct mount *source_mnt,
 		/* Notice when we are propagating across user namespaces */
 		if (child->mnt_parent->mnt_ns->user_ns != user_ns)
 			lock_mnt_tree(child);
-		child->mnt.mnt_flags &= ~MNT_LOCKED;
 		commit_tree(child);
 	}
 	put_mountpoint(smp);
@@ -3027,26 +3031,21 @@ static inline bool may_copy_tree(struct path *path)
 
 static struct mount *__do_loopback(struct path *old_path, int recurse)
 {
-	struct mount *mnt = ERR_PTR(-EINVAL), *old = real_mount(old_path->mnt);
+	struct mount *old = real_mount(old_path->mnt);
 
 	if (IS_MNT_UNBINDABLE(old))
-		return mnt;
+		return ERR_PTR(-EINVAL);
 
 	if (!may_copy_tree(old_path))
-		return mnt;
+		return ERR_PTR(-EINVAL);
 
 	if (!recurse && has_locked_children(old, old_path->dentry))
-		return mnt;
+		return ERR_PTR(-EINVAL);
 
 	if (recurse)
-		mnt = copy_tree(old, old_path->dentry, CL_COPY_MNT_NS_FILE);
+		return copy_tree(old, old_path->dentry, CL_COPY_MNT_NS_FILE);
 	else
-		mnt = clone_mnt(old, old_path->dentry, 0);
-
-	if (!IS_ERR(mnt))
-		mnt->mnt.mnt_flags &= ~MNT_LOCKED;
-
-	return mnt;
+		return clone_mnt(old, old_path->dentry, 0);
 }
 
 /*
@@ -4789,11 +4788,11 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
 	if (!path_mounted(&root))
 		goto out4; /* not a mountpoint */
 	if (!mnt_has_parent(root_mnt))
-		goto out4; /* not attached */
+		goto out4; /* absolute root */
 	if (!path_mounted(&new))
 		goto out4; /* not a mountpoint */
 	if (!mnt_has_parent(new_mnt))
-		goto out4; /* not attached */
+		goto out4; /* absolute root */
 	/* make sure we can reach put_old from new_root */
 	if (!is_path_reachable(old_mnt, old.dentry, &new))
 		goto out4;
@@ -6191,7 +6190,6 @@ static void __init init_mount_tree(void)
 
 	root.mnt = mnt;
 	root.dentry = mnt->mnt_root;
-	mnt->mnt_flags |= MNT_LOCKED;
 
 	set_fs_pwd(current->fs, &root);
 	set_fs_root(current->fs, &root);

next prev parent reply	other threads:[~2025-05-06  2:25 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-01 20:15 [RFC] MNT_LOCKED vs. finish_automount() Al Viro
2025-05-03  3:46 ` Eric W. Biederman
2025-05-04 23:24   ` Al Viro
2025-05-05 13:52     ` Eric W. Biederman
2025-05-05 19:02       ` Al Viro
2025-05-06  2:25         ` Al Viro [this message]
2025-05-05 13:54     ` Christian Brauner

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:04a9bb9f31f dfblob:165b3bd2685 )
 OR (
bs:"Re: [RFC] MNT_LOCKED vs. finish_automount()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250506022511.GL2023217@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=brauner@kernel.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.