linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHES][CFR] vfs fixes
@ 2025-06-03 23:15 Al Viro
  2025-06-03 23:16 ` [PATCH 1/5] fs/fhandle.c: fix a race in call of has_locked_children() Al Viro
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Al Viro @ 2025-06-03 23:15 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Linus Torvalds, Christian Brauner, Jan Kara

	Fixes for assorted bugs caught by struct mount audit.
This stuff sits in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #fixes

Please, review; I'm going to push those to Linus in a few days.

Individual patches in followups.

1) fs/fhandle.c: fix a race in call of has_locked_children()
	traversing the list of children without mount_lock; oopsable,
present since v6.11.
2) path_overmount(): avoid false negatives
	namespace_sem is not enough to prevent false negatives from
__lookup_mnt(); rcu_read_lock() makes it memory-safe, but mount_lock
seqretry is needed for valid result.  Present since _way_ back -
predates path_overmount(), actually.  Originally introduced in v5.7
3) finish_automount(): don't leak MNT_LOCKED from parent to child
	MNT_LOCKED is incompatible with MNT_SHRINKABLE and such
combinations had been prevented from the very beginning; unfortunately,
one case got missed - automount triggered within an MNT_LOCKED mount.
Goes all the way back to v3.12...
4) fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)
	In case when old mount both receives and transmits mount events,
do_set_group() end up corrupting the data structures.  Introduced in
v5.15
5) fs: allow clone_private_mount() for a path on real rootfs
	v6.15 introduced a way to use locations in detached
trees as overlayfs layers; unfortunately, the way it had
been done ended up breaking something that used to be allowed -
using locations on initramfs as overlayfs layers.  Turns out
that people really used such setups...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/5] fs/fhandle.c: fix a race in call of has_locked_children()
  2025-06-03 23:15 [PATCHES][CFR] vfs fixes Al Viro
@ 2025-06-03 23:16 ` Al Viro
  2025-06-04  7:37   ` Christian Brauner
  2025-06-04 11:57   ` Jeff Layton
  2025-06-03 23:17 ` [PATCH 2/5] path_overmount(): avoid false negatives Al Viro
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 12+ messages in thread
From: Al Viro @ 2025-06-03 23:16 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Jan Kara, Jeff Layton,
	Amir Goldstein

may_decode_fh() is calling has_locked_children() while holding no locks.
That's an oopsable race...

The rest of the callers are safe since they are holding namespace_sem and
are guaranteed a positive refcount on the mount in question.

Rename the current has_locked_children() to __has_locked_children(), make
it static and switch the fs/namespace.c users to it.

Make has_locked_children() a wrapper for __has_locked_children(), calling
the latter under read_seqlock_excl(&mount_lock).

Fixes: 620c266f3949 ("fhandle: relax open_by_handle_at() permission checks")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 7c0ebc4f4ef2..a33553bc12d0 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2425,7 +2425,7 @@ void drop_collected_mounts(struct vfsmount *mnt)
 	namespace_unlock();
 }
 
-bool has_locked_children(struct mount *mnt, struct dentry *dentry)
+static bool __has_locked_children(struct mount *mnt, struct dentry *dentry)
 {
 	struct mount *child;
 
@@ -2439,6 +2439,16 @@ bool has_locked_children(struct mount *mnt, struct dentry *dentry)
 	return false;
 }
 
+bool has_locked_children(struct mount *mnt, struct dentry *dentry)
+{
+	bool res;
+
+	read_seqlock_excl(&mount_lock);
+	res = __has_locked_children(mnt, dentry);
+	read_sequnlock_excl(&mount_lock);
+	return res;
+}
+
 /*
  * Check that there aren't references to earlier/same mount namespaces in the
  * specified subtree.  Such references can act as pins for mount namespaces
@@ -2499,7 +2509,7 @@ struct vfsmount *clone_private_mount(const struct path *path)
 			return ERR_PTR(-EINVAL);
 	}
 
-	if (has_locked_children(old_mnt, path->dentry))
+	if (__has_locked_children(old_mnt, path->dentry))
 		return ERR_PTR(-EINVAL);
 
 	new_mnt = clone_mnt(old_mnt, path->dentry, CL_PRIVATE);
@@ -3036,7 +3046,7 @@ static struct mount *__do_loopback(struct path *old_path, int recurse)
 	if (!may_copy_tree(old_path))
 		return mnt;
 
-	if (!recurse && has_locked_children(old, old_path->dentry))
+	if (!recurse && __has_locked_children(old, old_path->dentry))
 		return mnt;
 
 	if (recurse)
@@ -3429,7 +3439,7 @@ static int do_set_group(struct path *from_path, struct path *to_path)
 		goto out;
 
 	/* From mount should not have locked children in place of To's root */
-	if (has_locked_children(from, to->mnt.mnt_root))
+	if (__has_locked_children(from, to->mnt.mnt_root))
 		goto out;
 
 	/* Setting sharing groups is only allowed on private mounts */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/5] path_overmount(): avoid false negatives
  2025-06-03 23:15 [PATCHES][CFR] vfs fixes Al Viro
  2025-06-03 23:16 ` [PATCH 1/5] fs/fhandle.c: fix a race in call of has_locked_children() Al Viro
@ 2025-06-03 23:17 ` Al Viro
  2025-06-04  7:38   ` Christian Brauner
  2025-06-03 23:18 ` [PATCH 3/5] finish_automount(): don't leak MNT_LOCKED from parent to child Al Viro
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2025-06-03 23:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Linus Torvalds, Christian Brauner, Jan Kara

Holding namespace_sem is enough to make sure that result remains valid.
It is *not* enough to avoid false negatives from __lookup_mnt().  Mounts
can be unhashed outside of namespace_sem (stuck children getting detached
on final mntput() of lazy-umounted mount) and having an unrelated mount
removed from the hash chain while we traverse it may end up with false
negative from __lookup_mnt().  We need to sample and recheck the seqlock
component of mount_lock...

Bug predates the introduction of path_overmount() - it had come from
the code in finish_automount() that got abstracted into that helper.

Fixes: 26df6034fdb2 ("fix automount/automount race properly")
Fixes: 6ac392815628 ("fs: allow to mount beneath top mount")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index a33553bc12d0..1722deadfb88 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3478,18 +3478,25 @@ static int do_set_group(struct path *from_path, struct path *to_path)
  * Check if path is overmounted, i.e., if there's a mount on top of
  * @path->mnt with @path->dentry as mountpoint.
  *
- * Context: This function expects namespace_lock() to be held.
+ * Context: namespace_sem must be held at least shared.
+ * MUST NOT be called under lock_mount_hash() (there one should just
+ * call __lookup_mnt() and check if it returns NULL).
  * Return: If path is overmounted true is returned, false if not.
  */
 static inline bool path_overmounted(const struct path *path)
 {
+	unsigned seq = read_seqbegin(&mount_lock);
+	bool no_child;
+
 	rcu_read_lock();
-	if (unlikely(__lookup_mnt(path->mnt, path->dentry))) {
-		rcu_read_unlock();
-		return true;
-	}
+	no_child = !__lookup_mnt(path->mnt, path->dentry);
 	rcu_read_unlock();
-	return false;
+	if (need_seqretry(&mount_lock, seq)) {
+		read_seqlock_excl(&mount_lock);
+		no_child = !__lookup_mnt(path->mnt, path->dentry);
+		read_sequnlock_excl(&mount_lock);
+	}
+	return unlikely(!no_child);
 }
 
 /**
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/5] finish_automount(): don't leak MNT_LOCKED from parent to child
  2025-06-03 23:15 [PATCHES][CFR] vfs fixes Al Viro
  2025-06-03 23:16 ` [PATCH 1/5] fs/fhandle.c: fix a race in call of has_locked_children() Al Viro
  2025-06-03 23:17 ` [PATCH 2/5] path_overmount(): avoid false negatives Al Viro
@ 2025-06-03 23:18 ` Al Viro
  2025-06-04  7:39   ` Christian Brauner
  2025-06-03 23:19 ` [PATCH 4/5] fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2) Al Viro
  2025-06-03 23:20 ` [PATCH 5/5] fs: allow clone_private_mount() for a path on real rootfs Al Viro
  4 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2025-06-03 23:18 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Jan Kara, Eric W. Biederman

Intention for MNT_LOCKED had always been to protect the internal
mountpoints within a subtree that got copied across the userns boundary,
not the mountpoint that tree got attached to - after all, it _was_
exposed before the copying.

For roots of secondary copies that is enforced in attach_recursive_mnt() -
MNT_LOCKED is explicitly stripped for those.  For the root of primary
copy we are almost always guaranteed that MNT_LOCKED won't be there,
so attach_recursive_mnt() doesn't bother.  Unfortunately, one call
chain got overlooked - triggering e.g. NFS referral will have the
submount inherit the public flags from parent; that's fine for such
things as read-only, nosuid, etc., but not for MNT_LOCKED.

This is particularly pointless since the mount attached by finish_automount()
is usually expirable, which makes any protection granted by MNT_LOCKED
null and void; just wait for a while and that mount will go away on its own.

Include MNT_LOCKED into the set of flags to be ignored by do_add_mount() - it
really is an internal flag.

Fixes: 5ff9d8a65ce8 ("vfs: Lock in place mounts from more privileged users")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/mount.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/mount.h b/include/linux/mount.h
index 6904ad33ee7a..1a3136e53eaa 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -65,7 +65,8 @@ enum mount_flags {
 	MNT_ATIME_MASK = MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME,
 
 	MNT_INTERNAL_FLAGS = MNT_SHARED | MNT_WRITE_HOLD | MNT_INTERNAL |
-			     MNT_DOOMED | MNT_SYNC_UMOUNT | MNT_MARKED,
+			     MNT_DOOMED | MNT_SYNC_UMOUNT | MNT_MARKED |
+			     MNT_LOCKED,
 };
 
 struct vfsmount {
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/5] fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)
  2025-06-03 23:15 [PATCHES][CFR] vfs fixes Al Viro
                   ` (2 preceding siblings ...)
  2025-06-03 23:18 ` [PATCH 3/5] finish_automount(): don't leak MNT_LOCKED from parent to child Al Viro
@ 2025-06-03 23:19 ` Al Viro
  2025-06-04  7:39   ` Christian Brauner
  2025-06-03 23:20 ` [PATCH 5/5] fs: allow clone_private_mount() for a path on real rootfs Al Viro
  4 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2025-06-03 23:19 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Jan Kara, Pavel Tikhomirov

9ffb14ef61ba "move_mount: allow to add a mount into an existing group"
breaks assertions on ->mnt_share/->mnt_slave.  For once, the data structures
in question are actually documented.

Documentation/filesystem/sharedsubtree.rst:
        All vfsmounts in a peer group have the same ->mnt_master.  If it is
	non-NULL, they form a contiguous (ordered) segment of slave list.

do_set_group() puts a mount into the same place in propagation graph
as the old one.  As the result, if old mount gets events from somewhere
and is not a pure event sink, new one needs to be placed next to the
old one in the slave list the old one's on.  If it is a pure event
sink, we only need to make sure the new one doesn't end up in the
middle of some peer group.

"move_mount: allow to add a mount into an existing group" ends up putting
the new one in the beginning of list; that's definitely not going to be
in the middle of anything, so that's fine for case when old is not marked
shared.  In case when old one _is_ marked shared (i.e. is not a pure event
sink), that breaks the assumptions of propagation graph iterators.

Put the new mount next to the old one on the list - that does the right thing
in "old is marked shared" case and is just as correct as the current behaviour
if old is not marked shared (kudos to Pavel for pointing that out - my original
suggested fix changed behaviour in the "nor marked" case, which complicated
things for no good reason).

Fixes: 9ffb14ef61ba ("move_mount: allow to add a mount into an existing group")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 1722deadfb88..6c94ecbe2c2c 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3453,7 +3453,7 @@ static int do_set_group(struct path *from_path, struct path *to_path)
 	if (IS_MNT_SLAVE(from)) {
 		struct mount *m = from->mnt_master;
 
-		list_add(&to->mnt_slave, &m->mnt_slave_list);
+		list_add(&to->mnt_slave, &from->mnt_slave);
 		to->mnt_master = m;
 	}
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/5] fs: allow clone_private_mount() for a path on real rootfs
  2025-06-03 23:15 [PATCHES][CFR] vfs fixes Al Viro
                   ` (3 preceding siblings ...)
  2025-06-03 23:19 ` [PATCH 4/5] fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2) Al Viro
@ 2025-06-03 23:20 ` Al Viro
  2025-06-04  7:40   ` Christian Brauner
  4 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2025-06-03 23:20 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Linus Torvalds, Christian Brauner, Jan Kara, Kazuma Kondo

From: =?UTF-8?q?KONDO=20KAZUMA=28=E8=BF=91=E8=97=A4=E3=80=80=E5=92=8C?=
 =?UTF-8?q?=E7=9C=9F=29?= <kazuma-kondo@nec.com>

Mounting overlayfs with a directory on real rootfs (initramfs)
as upperdir has failed with following message since commit
db04662e2f4f ("fs: allow detached mounts in clone_private_mount()").

  [    4.080134] overlayfs: failed to clone upperpath

Overlayfs mount uses clone_private_mount() to create internal mount
for the underlying layers.

The commit made clone_private_mount() reject real rootfs because
it does not have a parent mount and is in the initial mount namespace,
that is not an anonymous mount namespace.

This issue can be fixed by modifying the permission check
of clone_private_mount() following [1].

Fixes: db04662e2f4f ("fs: allow detached mounts in clone_private_mount()")
Link: https://lore.kernel.org/all/20250514190252.GQ2023217@ZenIV/ [1]
Link: https://lore.kernel.org/all/20250506194849.GT2023217@ZenIV/
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Kazuma Kondo <kazuma-kondo@nec.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/namespace.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 6c94ecbe2c2c..854099aafed5 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2493,18 +2493,19 @@ struct vfsmount *clone_private_mount(const struct path *path)
 	if (IS_MNT_UNBINDABLE(old_mnt))
 		return ERR_PTR(-EINVAL);
 
-	if (mnt_has_parent(old_mnt)) {
-		if (!check_mnt(old_mnt))
-			return ERR_PTR(-EINVAL);
-	} else {
-		if (!is_mounted(&old_mnt->mnt))
-			return ERR_PTR(-EINVAL);
-
-		/* Make sure this isn't something purely kernel internal. */
-		if (!is_anon_ns(old_mnt->mnt_ns))
+	/*
+	 * Make sure the source mount is acceptable.
+	 * Anything mounted in our mount namespace is allowed.
+	 * Otherwise, it must be the root of an anonymous mount
+	 * namespace, and we need to make sure no namespace
+	 * loops get created.
+	 */
+	if (!check_mnt(old_mnt)) {
+		if (!is_mounted(&old_mnt->mnt) ||
+			!is_anon_ns(old_mnt->mnt_ns) ||
+			mnt_has_parent(old_mnt))
 			return ERR_PTR(-EINVAL);
 
-		/* Make sure we don't create mount namespace loops. */
 		if (!check_for_nsfs_mounts(old_mnt))
 			return ERR_PTR(-EINVAL);
 	}
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/5] fs/fhandle.c: fix a race in call of has_locked_children()
  2025-06-03 23:16 ` [PATCH 1/5] fs/fhandle.c: fix a race in call of has_locked_children() Al Viro
@ 2025-06-04  7:37   ` Christian Brauner
  2025-06-04 11:57   ` Jeff Layton
  1 sibling, 0 replies; 12+ messages in thread
From: Christian Brauner @ 2025-06-04  7:37 UTC (permalink / raw)
  To: Al Viro
  Cc: linux-fsdevel, Linus Torvalds, Jan Kara, Jeff Layton,
	Amir Goldstein

On Wed, Jun 04, 2025 at 12:16:32AM +0100, Al Viro wrote:
> may_decode_fh() is calling has_locked_children() while holding no locks.
> That's an oopsable race...
> 
> The rest of the callers are safe since they are holding namespace_sem and
> are guaranteed a positive refcount on the mount in question.
> 
> Rename the current has_locked_children() to __has_locked_children(), make
> it static and switch the fs/namespace.c users to it.
> 
> Make has_locked_children() a wrapper for __has_locked_children(), calling
> the latter under read_seqlock_excl(&mount_lock).
> 
> Fixes: 620c266f3949 ("fhandle: relax open_by_handle_at() permission checks")
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/5] path_overmount(): avoid false negatives
  2025-06-03 23:17 ` [PATCH 2/5] path_overmount(): avoid false negatives Al Viro
@ 2025-06-04  7:38   ` Christian Brauner
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Brauner @ 2025-06-04  7:38 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds, Jan Kara

On Wed, Jun 04, 2025 at 12:17:09AM +0100, Al Viro wrote:
> Holding namespace_sem is enough to make sure that result remains valid.
> It is *not* enough to avoid false negatives from __lookup_mnt().  Mounts
> can be unhashed outside of namespace_sem (stuck children getting detached
> on final mntput() of lazy-umounted mount) and having an unrelated mount
> removed from the hash chain while we traverse it may end up with false
> negative from __lookup_mnt().  We need to sample and recheck the seqlock
> component of mount_lock...
> 
> Bug predates the introduction of path_overmount() - it had come from
> the code in finish_automount() that got abstracted into that helper.
> 
> Fixes: 26df6034fdb2 ("fix automount/automount race properly")
> Fixes: 6ac392815628 ("fs: allow to mount beneath top mount")
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/5] finish_automount(): don't leak MNT_LOCKED from parent to child
  2025-06-03 23:18 ` [PATCH 3/5] finish_automount(): don't leak MNT_LOCKED from parent to child Al Viro
@ 2025-06-04  7:39   ` Christian Brauner
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Brauner @ 2025-06-04  7:39 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds, Jan Kara, Eric W. Biederman

On Wed, Jun 04, 2025 at 12:18:07AM +0100, Al Viro wrote:
> Intention for MNT_LOCKED had always been to protect the internal
> mountpoints within a subtree that got copied across the userns boundary,
> not the mountpoint that tree got attached to - after all, it _was_
> exposed before the copying.
> 
> For roots of secondary copies that is enforced in attach_recursive_mnt() -
> MNT_LOCKED is explicitly stripped for those.  For the root of primary
> copy we are almost always guaranteed that MNT_LOCKED won't be there,
> so attach_recursive_mnt() doesn't bother.  Unfortunately, one call
> chain got overlooked - triggering e.g. NFS referral will have the
> submount inherit the public flags from parent; that's fine for such
> things as read-only, nosuid, etc., but not for MNT_LOCKED.
> 
> This is particularly pointless since the mount attached by finish_automount()
> is usually expirable, which makes any protection granted by MNT_LOCKED
> null and void; just wait for a while and that mount will go away on its own.
> 
> Include MNT_LOCKED into the set of flags to be ignored by do_add_mount() - it
> really is an internal flag.
> 
> Fixes: 5ff9d8a65ce8 ("vfs: Lock in place mounts from more privileged users")
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/5] fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2)
  2025-06-03 23:19 ` [PATCH 4/5] fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2) Al Viro
@ 2025-06-04  7:39   ` Christian Brauner
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Brauner @ 2025-06-04  7:39 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds, Jan Kara, Pavel Tikhomirov

On Wed, Jun 04, 2025 at 12:19:11AM +0100, Al Viro wrote:
> 9ffb14ef61ba "move_mount: allow to add a mount into an existing group"
> breaks assertions on ->mnt_share/->mnt_slave.  For once, the data structures
> in question are actually documented.
> 
> Documentation/filesystem/sharedsubtree.rst:
>         All vfsmounts in a peer group have the same ->mnt_master.  If it is
> 	non-NULL, they form a contiguous (ordered) segment of slave list.
> 
> do_set_group() puts a mount into the same place in propagation graph
> as the old one.  As the result, if old mount gets events from somewhere
> and is not a pure event sink, new one needs to be placed next to the
> old one in the slave list the old one's on.  If it is a pure event
> sink, we only need to make sure the new one doesn't end up in the
> middle of some peer group.
> 
> "move_mount: allow to add a mount into an existing group" ends up putting
> the new one in the beginning of list; that's definitely not going to be
> in the middle of anything, so that's fine for case when old is not marked
> shared.  In case when old one _is_ marked shared (i.e. is not a pure event
> sink), that breaks the assumptions of propagation graph iterators.
> 
> Put the new mount next to the old one on the list - that does the right thing
> in "old is marked shared" case and is just as correct as the current behaviour
> if old is not marked shared (kudos to Pavel for pointing that out - my original
> suggested fix changed behaviour in the "nor marked" case, which complicated
> things for no good reason).
> 
> Fixes: 9ffb14ef61ba ("move_mount: allow to add a mount into an existing group")
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/5] fs: allow clone_private_mount() for a path on real rootfs
  2025-06-03 23:20 ` [PATCH 5/5] fs: allow clone_private_mount() for a path on real rootfs Al Viro
@ 2025-06-04  7:40   ` Christian Brauner
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Brauner @ 2025-06-04  7:40 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds, Jan Kara, Kazuma Kondo

On Wed, Jun 04, 2025 at 12:20:11AM +0100, Al Viro wrote:
> From: =?UTF-8?q?KONDO=20KAZUMA=28=E8=BF=91=E8=97=A4=E3=80=80=E5=92=8C?=
>  =?UTF-8?q?=E7=9C=9F=29?= <kazuma-kondo@nec.com>
> 
> Mounting overlayfs with a directory on real rootfs (initramfs)
> as upperdir has failed with following message since commit
> db04662e2f4f ("fs: allow detached mounts in clone_private_mount()").
> 
>   [    4.080134] overlayfs: failed to clone upperpath
> 
> Overlayfs mount uses clone_private_mount() to create internal mount
> for the underlying layers.
> 
> The commit made clone_private_mount() reject real rootfs because
> it does not have a parent mount and is in the initial mount namespace,
> that is not an anonymous mount namespace.
> 
> This issue can be fixed by modifying the permission check
> of clone_private_mount() following [1].
> 
> Fixes: db04662e2f4f ("fs: allow detached mounts in clone_private_mount()")
> Link: https://lore.kernel.org/all/20250514190252.GQ2023217@ZenIV/ [1]
> Link: https://lore.kernel.org/all/20250506194849.GT2023217@ZenIV/
> Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Kazuma Kondo <kazuma-kondo@nec.com>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/5] fs/fhandle.c: fix a race in call of has_locked_children()
  2025-06-03 23:16 ` [PATCH 1/5] fs/fhandle.c: fix a race in call of has_locked_children() Al Viro
  2025-06-04  7:37   ` Christian Brauner
@ 2025-06-04 11:57   ` Jeff Layton
  1 sibling, 0 replies; 12+ messages in thread
From: Jeff Layton @ 2025-06-04 11:57 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel
  Cc: Linus Torvalds, Christian Brauner, Jan Kara, Amir Goldstein

On Wed, 2025-06-04 at 00:16 +0100, Al Viro wrote:
> may_decode_fh() is calling has_locked_children() while holding no locks.
> That's an oopsable race...
> 
> The rest of the callers are safe since they are holding namespace_sem and
> are guaranteed a positive refcount on the mount in question.
> 
> Rename the current has_locked_children() to __has_locked_children(), make
> it static and switch the fs/namespace.c users to it.
> 
> Make has_locked_children() a wrapper for __has_locked_children(), calling
> the latter under read_seqlock_excl(&mount_lock).
> 
> Fixes: 620c266f3949 ("fhandle: relax open_by_handle_at() permission checks")
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/namespace.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 7c0ebc4f4ef2..a33553bc12d0 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2425,7 +2425,7 @@ void drop_collected_mounts(struct vfsmount *mnt)
>  	namespace_unlock();
>  }
>  
> -bool has_locked_children(struct mount *mnt, struct dentry *dentry)
> +static bool __has_locked_children(struct mount *mnt, struct dentry *dentry)
>  {
>  	struct mount *child;
>  
> @@ -2439,6 +2439,16 @@ bool has_locked_children(struct mount *mnt, struct dentry *dentry)
>  	return false;
>  }
>  
> +bool has_locked_children(struct mount *mnt, struct dentry *dentry)
> +{
> +	bool res;
> +
> +	read_seqlock_excl(&mount_lock);
> +	res = __has_locked_children(mnt, dentry);
> +	read_sequnlock_excl(&mount_lock);
> +	return res;
> +}
> +
>  /*
>   * Check that there aren't references to earlier/same mount namespaces in the
>   * specified subtree.  Such references can act as pins for mount namespaces
> @@ -2499,7 +2509,7 @@ struct vfsmount *clone_private_mount(const struct path *path)
>  			return ERR_PTR(-EINVAL);
>  	}
>  
> -	if (has_locked_children(old_mnt, path->dentry))
> +	if (__has_locked_children(old_mnt, path->dentry))
>  		return ERR_PTR(-EINVAL);
>  
>  	new_mnt = clone_mnt(old_mnt, path->dentry, CL_PRIVATE);
> @@ -3036,7 +3046,7 @@ static struct mount *__do_loopback(struct path *old_path, int recurse)
>  	if (!may_copy_tree(old_path))
>  		return mnt;
>  
> -	if (!recurse && has_locked_children(old, old_path->dentry))
> +	if (!recurse && __has_locked_children(old, old_path->dentry))
>  		return mnt;
>  
>  	if (recurse)
> @@ -3429,7 +3439,7 @@ static int do_set_group(struct path *from_path, struct path *to_path)
>  		goto out;
>  
>  	/* From mount should not have locked children in place of To's root */
> -	if (has_locked_children(from, to->mnt.mnt_root))
> +	if (__has_locked_children(from, to->mnt.mnt_root))
>  		goto out;
>  
>  	/* Setting sharing groups is only allowed on private mounts */

Good catch!

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-06-04 11:57 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-03 23:15 [PATCHES][CFR] vfs fixes Al Viro
2025-06-03 23:16 ` [PATCH 1/5] fs/fhandle.c: fix a race in call of has_locked_children() Al Viro
2025-06-04  7:37   ` Christian Brauner
2025-06-04 11:57   ` Jeff Layton
2025-06-03 23:17 ` [PATCH 2/5] path_overmount(): avoid false negatives Al Viro
2025-06-04  7:38   ` Christian Brauner
2025-06-03 23:18 ` [PATCH 3/5] finish_automount(): don't leak MNT_LOCKED from parent to child Al Viro
2025-06-04  7:39   ` Christian Brauner
2025-06-03 23:19 ` [PATCH 4/5] fix propagation graph breakage by MOVE_MOUNT_SET_GROUP move_mount(2) Al Viro
2025-06-04  7:39   ` Christian Brauner
2025-06-03 23:20 ` [PATCH 5/5] fs: allow clone_private_mount() for a path on real rootfs Al Viro
2025-06-04  7:40   ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).