linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Apparent mount behaviour change in 6.15
@ 2025-05-15  4:06 Allison Karlitskaya
  2025-05-15 11:25 ` Christian Brauner
  0 siblings, 1 reply; 11+ messages in thread
From: Allison Karlitskaya @ 2025-05-15  4:06 UTC (permalink / raw)
  To: Christian Brauner; +Cc: linux-fsdevel

hi,

The CI in the composefs-rs project picked up an interesting issue with
the recent mount API changes in the latest 6.15-rc kernels.  I've
managed to produce a minimal reproducer.

==> test.sh <==
#!/bin/sh

set -eux

uname -a
umount --recursive tmp/mnt || true
rm -rf tmp/mnt tmp/new
mkdir -p tmp/mnt tmp/new tmp/new/old
touch tmp/mnt/this-is-old
touch tmp/new/this-is-new
./swapmnt tmp/new tmp/mnt
find tmp/mnt

==> swapmnt.c <==
// Replace [old] with a clone of [new], moving [old] to [new]/"old"

#define _GNU_SOURCE
#include <err.h>
#include <fcntl.h>
#include <linux/mount.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <unistd.h>

int
main (int argc, char **argv) {
  if (argc != 3) {
    fprintf(stderr, "usage: %s [new] [old]", argv[0]);
    return 1;
  }

  const char *new = argv[1];
  const char *old = argv[2];

  int oldfd = syscall(SYS_open_tree, AT_FDCWD, old,
OPEN_TREE_CLONE|OPEN_TREE_CLOEXEC);
  if (oldfd == -1)
    err(EXIT_FAILURE, "open_tree('%s', OPEN_TREE_CLONE)", old);

  int newfd = syscall(SYS_open_tree, AT_FDCWD, new,
OPEN_TREE_CLONE|OPEN_TREE_CLOEXEC);
  if (newfd == -1)
    err(EXIT_FAILURE, "open_tree('%s', OPEN_TREE_CLONE)", new);

  if (syscall(SYS_move_mount, newfd, "", AT_FDCWD, old,
MOVE_MOUNT_F_EMPTY_PATH))
    err(EXIT_FAILURE, "move_mount('%s' -> '%s')", new, old);

  if (syscall(SYS_move_mount, oldfd, "", newfd, "old", MOVE_MOUNT_F_EMPTY_PATH))
    err(EXIT_FAILURE, "move_mount('%s' -> (new)'%s'/old)", old, old);

  return 0;
}

On 6.14 (Fedora 42) this looks like:

[root@fedora-bls-efi-127-0-0-2-2201 tmp]# sh test.sh
+ uname -a
Linux fedora-bls-efi-127-0-0-2-2201 6.14.5-300.fc42.x86_64 #1 SMP
PREEMPT_DYNAMIC Fri May  2 14:16:46 UTC 2025 x86_64 GNU/Linux
+ umount --recursive tmp/mnt
umount: tmp/mnt: not found
+ true
+ rm -rf tmp/mnt tmp/new
+ mkdir -p tmp/mnt tmp/new tmp/new/old
+ touch tmp/mnt/this-is-old
+ touch tmp/new/this-is-new
+ ./swapmnt tmp/new tmp/mnt
+ find tmp/mnt
tmp/mnt
tmp/mnt/this-is-new
tmp/mnt/old
tmp/mnt/old/this-is-old
[root@fedora-bls-efi-127-0-0-2-2201 tmp]#

On 6.15 from yesterday (9f35e33144ae, via @kernel-vanilla/mainline
copr, on rawhide):

[root@fedora tmp]# sh test.sh
+ uname -a
Linux fedora 6.15.0-0.rc6.20250514.9f35e331.450.vanilla.fc43.x86_64 #1
SMP PREEMPT_DYNAMIC Wed May 14 04:18:35 UTC 2025 x86_64 GNU/Linux
+ umount --recursive tmp/mnt
umount: tmp/mnt: not mounted
+ true
+ rm -rf tmp/mnt tmp/new
+ mkdir -p tmp/mnt tmp/new tmp/new/old
+ touch tmp/mnt/this-is-old
+ touch tmp/new/this-is-new
+ ./swapmnt tmp/new tmp/mnt
+ find tmp/mnt
tmp/mnt
tmp/mnt/this-is-new
find: File system loop detected; ‘tmp/mnt/old’ is part of the same
file system loop as ‘tmp/mnt’.
[root@fedora tmp]#

Otherwise, I gotta say I'm loving all of the new mount work this cycle!

Thanks,

lis


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Apparent mount behaviour change in 6.15
  2025-05-15  4:06 Apparent mount behaviour change in 6.15 Allison Karlitskaya
@ 2025-05-15 11:25 ` Christian Brauner
  2025-05-23  6:32   ` Al Viro
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Brauner @ 2025-05-15 11:25 UTC (permalink / raw)
  To: Allison Karlitskaya, Alexander Viro; +Cc: linux-fsdevel

On Thu, May 15, 2025 at 06:06:13AM +0200, Allison Karlitskaya wrote:
> hi,
> 
> The CI in the composefs-rs project picked up an interesting issue with
> the recent mount API changes in the latest 6.15-rc kernels.  I've
> managed to produce a minimal reproducer.
> 
> ==> test.sh <==
> #!/bin/sh
> 
> set -eux
> 
> uname -a
> umount --recursive tmp/mnt || true
> rm -rf tmp/mnt tmp/new
> mkdir -p tmp/mnt tmp/new tmp/new/old
> touch tmp/mnt/this-is-old
> touch tmp/new/this-is-new
> ./swapmnt tmp/new tmp/mnt
> find tmp/mnt
> 
> ==> swapmnt.c <==
> // Replace [old] with a clone of [new], moving [old] to [new]/"old"
> 
> #define _GNU_SOURCE
> #include <err.h>
> #include <fcntl.h>
> #include <linux/mount.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/stat.h>
> #include <sys/syscall.h>
> #include <unistd.h>
> 
> int
> main (int argc, char **argv) {
>   if (argc != 3) {
>     fprintf(stderr, "usage: %s [new] [old]", argv[0]);
>     return 1;
>   }
> 
>   const char *new = argv[1];
>   const char *old = argv[2];
> 
>   int oldfd = syscall(SYS_open_tree, AT_FDCWD, old,
> OPEN_TREE_CLONE|OPEN_TREE_CLOEXEC);
>   if (oldfd == -1)
>     err(EXIT_FAILURE, "open_tree('%s', OPEN_TREE_CLONE)", old);
> 
>   int newfd = syscall(SYS_open_tree, AT_FDCWD, new,
> OPEN_TREE_CLONE|OPEN_TREE_CLOEXEC);
>   if (newfd == -1)
>     err(EXIT_FAILURE, "open_tree('%s', OPEN_TREE_CLONE)", new);
> 
>   if (syscall(SYS_move_mount, newfd, "", AT_FDCWD, old,
> MOVE_MOUNT_F_EMPTY_PATH))
>     err(EXIT_FAILURE, "move_mount('%s' -> '%s')", new, old);
> 
>   if (syscall(SYS_move_mount, oldfd, "", newfd, "old", MOVE_MOUNT_F_EMPTY_PATH))
>     err(EXIT_FAILURE, "move_mount('%s' -> (new)'%s'/old)", old, old);
> 
>   return 0;
> }
> 
> On 6.14 (Fedora 42) this looks like:
> 
> [root@fedora-bls-efi-127-0-0-2-2201 tmp]# sh test.sh
> + uname -a
> Linux fedora-bls-efi-127-0-0-2-2201 6.14.5-300.fc42.x86_64 #1 SMP
> PREEMPT_DYNAMIC Fri May  2 14:16:46 UTC 2025 x86_64 GNU/Linux
> + umount --recursive tmp/mnt
> umount: tmp/mnt: not found
> + true
> + rm -rf tmp/mnt tmp/new
> + mkdir -p tmp/mnt tmp/new tmp/new/old
> + touch tmp/mnt/this-is-old
> + touch tmp/new/this-is-new
> + ./swapmnt tmp/new tmp/mnt
> + find tmp/mnt
> tmp/mnt
> tmp/mnt/this-is-new
> tmp/mnt/old
> tmp/mnt/old/this-is-old
> [root@fedora-bls-efi-127-0-0-2-2201 tmp]#
> 
> On 6.15 from yesterday (9f35e33144ae, via @kernel-vanilla/mainline
> copr, on rawhide):
> 
> [root@fedora tmp]# sh test.sh
> + uname -a
> Linux fedora 6.15.0-0.rc6.20250514.9f35e331.450.vanilla.fc43.x86_64 #1
> SMP PREEMPT_DYNAMIC Wed May 14 04:18:35 UTC 2025 x86_64 GNU/Linux
> + umount --recursive tmp/mnt
> umount: tmp/mnt: not mounted
> + true
> + rm -rf tmp/mnt tmp/new
> + mkdir -p tmp/mnt tmp/new tmp/new/old
> + touch tmp/mnt/this-is-old
> + touch tmp/new/this-is-new
> + ./swapmnt tmp/new tmp/mnt
> + find tmp/mnt
> tmp/mnt
> tmp/mnt/this-is-new
> find: File system loop detected; ‘tmp/mnt/old’ is part of the same
> file system loop as ‘tmp/mnt’.
> [root@fedora tmp]#
> 
> Otherwise, I gotta say I'm loving all of the new mount work this cycle!

Thank you! I'm happy it's useful and that it's picked up so quickly! :)

This is caused by my patch to allow mount propagation into detached
mount trees. That had been disabled forever until v6.15 to give strong
guarantees to userspace that detached mount trees are really detached.

I thought it might be useful for them to partake in propagation
via the MNTNS_PROPAGATION flag. But I've alredy expressed that I regret
this in another thread we had.

Al, I want to kill this again and restore the pre v6.15 behavior.
Allowing mount propagation for detached trees was a crazy
idea on my part. It's a pain and it regresses userspace. If composefs is
broken by this then systemd will absolutely get broken by my change as
well.

Something like this will allow to restore the status-quo:

 fs/mount.h     |  5 -----
 fs/namespace.c | 15 ++-------------
 fs/pnode.c     |  3 ---
 fs/pnode.h     |  2 +-
 4 files changed, 3 insertions(+), 22 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index 7aecf2a60472..ad7173037924 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -7,10 +7,6 @@
 
 extern struct list_head notify_list;
 
-typedef __u32 __bitwise mntns_flags_t;
-
-#define MNTNS_PROPAGATING	((__force mntns_flags_t)(1 << 0))
-
 struct mnt_namespace {
 	struct ns_common	ns;
 	struct mount *	root;
@@ -37,7 +33,6 @@ struct mnt_namespace {
 	struct rb_node		mnt_ns_tree_node; /* node in the mnt_ns_tree */
 	struct list_head	mnt_ns_list; /* entry in the sequential list of mounts namespace */
 	refcount_t		passive; /* number references not pinning @mounts */
-	mntns_flags_t		mntns_flags;
 } __randomize_layout;
 
 struct mnt_pcp {
diff --git a/fs/namespace.c b/fs/namespace.c
index 04a9bb9f31fa..900ffaeff7c6 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3647,7 +3647,7 @@ static int do_move_mount(struct path *old_path,
 	if (!(attached ? check_mnt(old) : is_anon_ns(ns)))
 		goto out;
 
-	if (is_anon_ns(ns)) {
+	if (is_anon_ns(ns) && ns == p->mnt_ns) {
 		/*
 		 * Ending up with two files referring to the root of the
 		 * same anonymous mount namespace would cause an error
@@ -3655,16 +3655,7 @@ static int do_move_mount(struct path *old_path,
 		 * twice into the mount tree which would be rejected
 		 * later. But be explicit about it right here.
 		 */
-		if ((is_anon_ns(p->mnt_ns) && ns == p->mnt_ns))
-			goto out;
-
-		/*
-		 * If this is an anonymous mount tree ensure that mount
-		 * propagation can detect mounts that were just
-		 * propagated to the target mount tree so we don't
-		 * propagate onto them.
-		 */
-		ns->mntns_flags |= MNTNS_PROPAGATING;
+		goto out;
 	} else if (is_anon_ns(p->mnt_ns)) {
 		/*
 		 * Don't allow moving an attached mount tree to an
@@ -3721,8 +3712,6 @@ static int do_move_mount(struct path *old_path,
 	if (attached)
 		put_mountpoint(old_mp);
 out:
-	if (is_anon_ns(ns))
-		ns->mntns_flags &= ~MNTNS_PROPAGATING;
 	unlock_mount(mp);
 	if (!err) {
 		if (attached) {
diff --git a/fs/pnode.c b/fs/pnode.c
index fb77427df39e..d80944d66237 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -231,9 +231,6 @@ static int propagate_one(struct mount *m, struct mountpoint *dest_mp)
 	/* skip if mountpoint isn't visible in m */
 	if (!is_subdir(dest_mp->m_dentry, m->mnt.mnt_root))
 		return 0;
-	/* skip if m is in the anon_ns we are emptying */
-	if (m->mnt_ns->mntns_flags & MNTNS_PROPAGATING)
-		return 0;
 
 	if (peers(m, last_dest)) {
 		type = CL_MAKE_SHARED;
diff --git a/fs/pnode.h b/fs/pnode.h
index 34b6247af01d..077028814e8d 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -12,7 +12,7 @@
 
 #define IS_MNT_SHARED(m) ((m)->mnt.mnt_flags & MNT_SHARED)
 #define IS_MNT_SLAVE(m) ((m)->mnt_master)
-#define IS_MNT_NEW(m) (!(m)->mnt_ns)
+#define IS_MNT_NEW(m) (!(m)->mnt_ns || is_anon_ns((m)->mnt_ns))
 #define CLEAR_MNT_SHARED(m) ((m)->mnt.mnt_flags &= ~MNT_SHARED)
 #define IS_MNT_UNBINDABLE(m) ((m)->mnt.mnt_flags & MNT_UNBINDABLE)
 #define IS_MNT_MARKED(m) ((m)->mnt.mnt_flags & MNT_MARKED)
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Apparent mount behaviour change in 6.15
  2025-05-15 11:25 ` Christian Brauner
@ 2025-05-23  6:32   ` Al Viro
  2025-05-23  8:41     ` Christian Brauner
  0 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2025-05-23  6:32 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Allison Karlitskaya, linux-fsdevel

On Thu, May 15, 2025 at 01:25:27PM +0200, Christian Brauner wrote:

> Al, I want to kill this again and restore the pre v6.15 behavior.
> Allowing mount propagation for detached trees was a crazy
> idea on my part. It's a pain and it regresses userspace. If composefs is
> broken by this then systemd will absolutely get broken by my change as
> well.
> 
> Something like this will allow to restore the status-quo:

> -#define IS_MNT_NEW(m) (!(m)->mnt_ns)
> +#define IS_MNT_NEW(m) (!(m)->mnt_ns || is_anon_ns((m)->mnt_ns))

FWIW, I'm not sure that ever had been quite correct, no matter how you
call the macro.  I'm not up to building a counterexample right now,
will do in the morning...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Apparent mount behaviour change in 6.15
  2025-05-23  6:32   ` Al Viro
@ 2025-05-23  8:41     ` Christian Brauner
  2025-05-23 21:29       ` Al Viro
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Brauner @ 2025-05-23  8:41 UTC (permalink / raw)
  To: Al Viro; +Cc: Allison Karlitskaya, linux-fsdevel

On Fri, May 23, 2025 at 07:32:38AM +0100, Al Viro wrote:
> On Thu, May 15, 2025 at 01:25:27PM +0200, Christian Brauner wrote:
> 
> > Al, I want to kill this again and restore the pre v6.15 behavior.
> > Allowing mount propagation for detached trees was a crazy
> > idea on my part. It's a pain and it regresses userspace. If composefs is
> > broken by this then systemd will absolutely get broken by my change as
> > well.
> > 
> > Something like this will allow to restore the status-quo:
> 
> > -#define IS_MNT_NEW(m) (!(m)->mnt_ns)
> > +#define IS_MNT_NEW(m) (!(m)->mnt_ns || is_anon_ns((m)->mnt_ns))
> 
> FWIW, I'm not sure that ever had been quite correct, no matter how you
> call the macro.  I'm not up to building a counterexample right now,
> will do in the morning...

The point is that we can't do mount propagation with detached trees
without regressing userspace. And we didn't do it before. I don't
specifically care how we block this but it needs to go out again.
Otherwise we release a kernel with the new semantics that regress
userspace.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Apparent mount behaviour change in 6.15
  2025-05-23  8:41     ` Christian Brauner
@ 2025-05-23 21:29       ` Al Viro
  2025-05-23 21:37         ` Al Viro
  0 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2025-05-23 21:29 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Allison Karlitskaya, linux-fsdevel

On Fri, May 23, 2025 at 10:41:14AM +0200, Christian Brauner wrote:
> On Fri, May 23, 2025 at 07:32:38AM +0100, Al Viro wrote:
> > On Thu, May 15, 2025 at 01:25:27PM +0200, Christian Brauner wrote:
> > 
> > > Al, I want to kill this again and restore the pre v6.15 behavior.
> > > Allowing mount propagation for detached trees was a crazy
> > > idea on my part. It's a pain and it regresses userspace. If composefs is
> > > broken by this then systemd will absolutely get broken by my change as
> > > well.
> > > 
> > > Something like this will allow to restore the status-quo:
> > 
> > > -#define IS_MNT_NEW(m) (!(m)->mnt_ns)
> > > +#define IS_MNT_NEW(m) (!(m)->mnt_ns || is_anon_ns((m)->mnt_ns))
> > 
> > FWIW, I'm not sure that ever had been quite correct, no matter how you
> > call the macro.  I'm not up to building a counterexample right now,
> > will do in the morning...
> 
> The point is that we can't do mount propagation with detached trees
> without regressing userspace. And we didn't do it before. I don't
> specifically care how we block this but it needs to go out again.
> Otherwise we release a kernel with the new semantics that regress
> userspace.

The problem is not in propagation *into* detached trees.  All you need
is
	A -> B -> C
(all in your normal namespace, nothing detached, etc.) followed by make-private
on B.

C should (and does) keep getting events from A.  Now, have open_tree() create
a detached clone of B.  Shouldn't affect anything between A, B and C, right?

With your patch doing open_tree before make-private B ends up with
no propagation from A to C... until the detached tree is closed.

See the problem?  You have nodes on detached trees still in propagation graph.
And your variant _stops_ propagation at nodes in anon namespaces, not just
skips such nodes themselves.

This is bogus, IMO.  I'm perfectly fine with propagate_one() returning 0
on anon_ns(m->mnt); that would refuse to propagate into *any* anon ns,
but won't screw the propagation between the mounts that are in normal, non-anon
namespaces.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Apparent mount behaviour change in 6.15
  2025-05-23 21:29       ` Al Viro
@ 2025-05-23 21:37         ` Al Viro
  2025-05-23 23:22           ` Al Viro
  0 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2025-05-23 21:37 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Allison Karlitskaya, linux-fsdevel

On Fri, May 23, 2025 at 10:29:58PM +0100, Al Viro wrote:

> This is bogus, IMO.  I'm perfectly fine with propagate_one() returning 0
> on anon_ns(m->mnt); that would refuse to propagate into *any* anon ns,
> but won't screw the propagation between the mounts that are in normal, non-anon
> namespaces.

IOW, I mean this variant - the only difference from what you've posted is
the location of is_anon_ns() test; you do it in IS_MNT_NEW(), this variant
has it in propagate_one().  Does the variant below fix regression?


diff --git a/fs/mount.h b/fs/mount.h
index 7aecf2a60472..ad7173037924 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -7,10 +7,6 @@
 
 extern struct list_head notify_list;
 
-typedef __u32 __bitwise mntns_flags_t;
-
-#define MNTNS_PROPAGATING	((__force mntns_flags_t)(1 << 0))
-
 struct mnt_namespace {
 	struct ns_common	ns;
 	struct mount *	root;
@@ -37,7 +33,6 @@ struct mnt_namespace {
 	struct rb_node		mnt_ns_tree_node; /* node in the mnt_ns_tree */
 	struct list_head	mnt_ns_list; /* entry in the sequential list of mounts namespace */
 	refcount_t		passive; /* number references not pinning @mounts */
-	mntns_flags_t		mntns_flags;
 } __randomize_layout;
 
 struct mnt_pcp {
diff --git a/fs/namespace.c b/fs/namespace.c
index 1b466c54a357..623cd110076d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3648,7 +3648,7 @@ static int do_move_mount(struct path *old_path,
 	if (!(attached ? check_mnt(old) : is_anon_ns(ns)))
 		goto out;
 
-	if (is_anon_ns(ns)) {
+	if (is_anon_ns(ns) && ns == p->mnt_ns) {
 		/*
 		 * Ending up with two files referring to the root of the
 		 * same anonymous mount namespace would cause an error
@@ -3656,16 +3656,7 @@ static int do_move_mount(struct path *old_path,
 		 * twice into the mount tree which would be rejected
 		 * later. But be explicit about it right here.
 		 */
-		if ((is_anon_ns(p->mnt_ns) && ns == p->mnt_ns))
-			goto out;
-
-		/*
-		 * If this is an anonymous mount tree ensure that mount
-		 * propagation can detect mounts that were just
-		 * propagated to the target mount tree so we don't
-		 * propagate onto them.
-		 */
-		ns->mntns_flags |= MNTNS_PROPAGATING;
+		goto out;
 	} else if (is_anon_ns(p->mnt_ns)) {
 		/*
 		 * Don't allow moving an attached mount tree to an
@@ -3722,8 +3713,6 @@ static int do_move_mount(struct path *old_path,
 	if (attached)
 		put_mountpoint(old_mp);
 out:
-	if (is_anon_ns(ns))
-		ns->mntns_flags &= ~MNTNS_PROPAGATING;
 	unlock_mount(mp);
 	if (!err) {
 		if (attached) {
diff --git a/fs/pnode.c b/fs/pnode.c
index fb77427df39e..ffd429b760d5 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -231,8 +231,8 @@ static int propagate_one(struct mount *m, struct mountpoint *dest_mp)
 	/* skip if mountpoint isn't visible in m */
 	if (!is_subdir(dest_mp->m_dentry, m->mnt.mnt_root))
 		return 0;
-	/* skip if m is in the anon_ns we are emptying */
-	if (m->mnt_ns->mntns_flags & MNTNS_PROPAGATING)
+	/* skip if m is in the anon_ns */
+	if (is_anon_ns(m->mnt_ns))
 		return 0;
 
 	if (peers(m, last_dest)) {

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Apparent mount behaviour change in 6.15
  2025-05-23 21:37         ` Al Viro
@ 2025-05-23 23:22           ` Al Viro
  2025-05-26  4:47             ` Christian Brauner
  2025-05-26  7:18             ` Allison Karlitskaya
  0 siblings, 2 replies; 11+ messages in thread
From: Al Viro @ 2025-05-23 23:22 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Allison Karlitskaya, linux-fsdevel

On Fri, May 23, 2025 at 10:37:35PM +0100, Al Viro wrote:
> On Fri, May 23, 2025 at 10:29:58PM +0100, Al Viro wrote:
> 
> > This is bogus, IMO.  I'm perfectly fine with propagate_one() returning 0
> > on anon_ns(m->mnt); that would refuse to propagate into *any* anon ns,
> > but won't screw the propagation between the mounts that are in normal, non-anon
> > namespaces.
> 
> IOW, I mean this variant - the only difference from what you've posted is
> the location of is_anon_ns() test; you do it in IS_MNT_NEW(), this variant
> has it in propagate_one().  Does the variant below fix regression?

AFAICS, it does suffice to revert the behaviour change on the reproducer
upthread.

I've replaced the top of viro/vfs.git#fixes with that; commit message there
is tentative - if nothing else, that's a patch from Christian with slight
modifications from me.  It also needs reported-by, etc.

Said that, could somebody (original reporter) confirm that the variant
in git.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git #fixes (head at
63e90fcc1807) is OK with them?

And yes, it will need a proper commit message.  Christian?

commit 63e90fcc18072638a62196caae93de66fc6cbc37
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Fri May 23 19:20:36 2025 -0400

    Don't propagate mounts into detached trees
    
    That reverts to behaviour of 6.14 and earlier, with
    fix from "fix IS_MNT_PROPAGATING uses" preserved.
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/fs/mount.h b/fs/mount.h
index 7aecf2a60472..ad7173037924 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -7,10 +7,6 @@
 
 extern struct list_head notify_list;
 
-typedef __u32 __bitwise mntns_flags_t;
-
-#define MNTNS_PROPAGATING	((__force mntns_flags_t)(1 << 0))
-
 struct mnt_namespace {
 	struct ns_common	ns;
 	struct mount *	root;
@@ -37,7 +33,6 @@ struct mnt_namespace {
 	struct rb_node		mnt_ns_tree_node; /* node in the mnt_ns_tree */
 	struct list_head	mnt_ns_list; /* entry in the sequential list of mounts namespace */
 	refcount_t		passive; /* number references not pinning @mounts */
-	mntns_flags_t		mntns_flags;
 } __randomize_layout;
 
 struct mnt_pcp {
diff --git a/fs/namespace.c b/fs/namespace.c
index 1b466c54a357..623cd110076d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3648,7 +3648,7 @@ static int do_move_mount(struct path *old_path,
 	if (!(attached ? check_mnt(old) : is_anon_ns(ns)))
 		goto out;
 
-	if (is_anon_ns(ns)) {
+	if (is_anon_ns(ns) && ns == p->mnt_ns) {
 		/*
 		 * Ending up with two files referring to the root of the
 		 * same anonymous mount namespace would cause an error
@@ -3656,16 +3656,7 @@ static int do_move_mount(struct path *old_path,
 		 * twice into the mount tree which would be rejected
 		 * later. But be explicit about it right here.
 		 */
-		if ((is_anon_ns(p->mnt_ns) && ns == p->mnt_ns))
-			goto out;
-
-		/*
-		 * If this is an anonymous mount tree ensure that mount
-		 * propagation can detect mounts that were just
-		 * propagated to the target mount tree so we don't
-		 * propagate onto them.
-		 */
-		ns->mntns_flags |= MNTNS_PROPAGATING;
+		goto out;
 	} else if (is_anon_ns(p->mnt_ns)) {
 		/*
 		 * Don't allow moving an attached mount tree to an
@@ -3722,8 +3713,6 @@ static int do_move_mount(struct path *old_path,
 	if (attached)
 		put_mountpoint(old_mp);
 out:
-	if (is_anon_ns(ns))
-		ns->mntns_flags &= ~MNTNS_PROPAGATING;
 	unlock_mount(mp);
 	if (!err) {
 		if (attached) {
diff --git a/fs/pnode.c b/fs/pnode.c
index fb77427df39e..ffd429b760d5 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -231,8 +231,8 @@ static int propagate_one(struct mount *m, struct mountpoint *dest_mp)
 	/* skip if mountpoint isn't visible in m */
 	if (!is_subdir(dest_mp->m_dentry, m->mnt.mnt_root))
 		return 0;
-	/* skip if m is in the anon_ns we are emptying */
-	if (m->mnt_ns->mntns_flags & MNTNS_PROPAGATING)
+	/* skip if m is in the anon_ns */
+	if (is_anon_ns(m->mnt_ns))
 		return 0;
 
 	if (peers(m, last_dest)) {

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Apparent mount behaviour change in 6.15
  2025-05-23 23:22           ` Al Viro
@ 2025-05-26  4:47             ` Christian Brauner
  2025-05-26 21:32               ` Al Viro
  2025-05-26  7:18             ` Allison Karlitskaya
  1 sibling, 1 reply; 11+ messages in thread
From: Christian Brauner @ 2025-05-26  4:47 UTC (permalink / raw)
  To: Al Viro; +Cc: Allison Karlitskaya, linux-fsdevel

On Sat, May 24, 2025 at 12:22:13AM +0100, Al Viro wrote:
> On Fri, May 23, 2025 at 10:37:35PM +0100, Al Viro wrote:
> > On Fri, May 23, 2025 at 10:29:58PM +0100, Al Viro wrote:
> > 
> > > This is bogus, IMO.  I'm perfectly fine with propagate_one() returning 0
> > > on anon_ns(m->mnt); that would refuse to propagate into *any* anon ns,
> > > but won't screw the propagation between the mounts that are in normal, non-anon
> > > namespaces.
> > 
> > IOW, I mean this variant - the only difference from what you've posted is
> > the location of is_anon_ns() test; you do it in IS_MNT_NEW(), this variant
> > has it in propagate_one().  Does the variant below fix regression?
> 
> AFAICS, it does suffice to revert the behaviour change on the reproducer
> upthread.
> 
> I've replaced the top of viro/vfs.git#fixes with that; commit message there
> is tentative - if nothing else, that's a patch from Christian with slight
> modifications from me.  It also needs reported-by, etc.
> 
> Said that, could somebody (original reporter) confirm that the variant
> in git.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git #fixes (head at
> 63e90fcc1807) is OK with them?
> 
> And yes, it will need a proper commit message.  Christian?

Yes, that looks good to me, thank you!

> 
> commit 63e90fcc18072638a62196caae93de66fc6cbc37
> Author: Al Viro <viro@zeniv.linux.org.uk>
> Date:   Fri May 23 19:20:36 2025 -0400
> 
>     Don't propagate mounts into detached trees
>     
>     That reverts to behaviour of 6.14 and earlier, with
>     fix from "fix IS_MNT_PROPAGATING uses" preserved.
>     
>     Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> 
> diff --git a/fs/mount.h b/fs/mount.h
> index 7aecf2a60472..ad7173037924 100644
> --- a/fs/mount.h
> +++ b/fs/mount.h
> @@ -7,10 +7,6 @@
>  
>  extern struct list_head notify_list;
>  
> -typedef __u32 __bitwise mntns_flags_t;
> -
> -#define MNTNS_PROPAGATING	((__force mntns_flags_t)(1 << 0))
> -
>  struct mnt_namespace {
>  	struct ns_common	ns;
>  	struct mount *	root;
> @@ -37,7 +33,6 @@ struct mnt_namespace {
>  	struct rb_node		mnt_ns_tree_node; /* node in the mnt_ns_tree */
>  	struct list_head	mnt_ns_list; /* entry in the sequential list of mounts namespace */
>  	refcount_t		passive; /* number references not pinning @mounts */
> -	mntns_flags_t		mntns_flags;
>  } __randomize_layout;
>  
>  struct mnt_pcp {
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 1b466c54a357..623cd110076d 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3648,7 +3648,7 @@ static int do_move_mount(struct path *old_path,
>  	if (!(attached ? check_mnt(old) : is_anon_ns(ns)))
>  		goto out;
>  
> -	if (is_anon_ns(ns)) {
> +	if (is_anon_ns(ns) && ns == p->mnt_ns) {
>  		/*
>  		 * Ending up with two files referring to the root of the
>  		 * same anonymous mount namespace would cause an error
> @@ -3656,16 +3656,7 @@ static int do_move_mount(struct path *old_path,
>  		 * twice into the mount tree which would be rejected
>  		 * later. But be explicit about it right here.
>  		 */
> -		if ((is_anon_ns(p->mnt_ns) && ns == p->mnt_ns))
> -			goto out;
> -
> -		/*
> -		 * If this is an anonymous mount tree ensure that mount
> -		 * propagation can detect mounts that were just
> -		 * propagated to the target mount tree so we don't
> -		 * propagate onto them.
> -		 */
> -		ns->mntns_flags |= MNTNS_PROPAGATING;
> +		goto out;
>  	} else if (is_anon_ns(p->mnt_ns)) {
>  		/*
>  		 * Don't allow moving an attached mount tree to an
> @@ -3722,8 +3713,6 @@ static int do_move_mount(struct path *old_path,
>  	if (attached)
>  		put_mountpoint(old_mp);
>  out:
> -	if (is_anon_ns(ns))
> -		ns->mntns_flags &= ~MNTNS_PROPAGATING;
>  	unlock_mount(mp);
>  	if (!err) {
>  		if (attached) {
> diff --git a/fs/pnode.c b/fs/pnode.c
> index fb77427df39e..ffd429b760d5 100644
> --- a/fs/pnode.c
> +++ b/fs/pnode.c
> @@ -231,8 +231,8 @@ static int propagate_one(struct mount *m, struct mountpoint *dest_mp)
>  	/* skip if mountpoint isn't visible in m */
>  	if (!is_subdir(dest_mp->m_dentry, m->mnt.mnt_root))
>  		return 0;
> -	/* skip if m is in the anon_ns we are emptying */
> -	if (m->mnt_ns->mntns_flags & MNTNS_PROPAGATING)
> +	/* skip if m is in the anon_ns */
> +	if (is_anon_ns(m->mnt_ns))
>  		return 0;
>  
>  	if (peers(m, last_dest)) {

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Apparent mount behaviour change in 6.15
  2025-05-23 23:22           ` Al Viro
  2025-05-26  4:47             ` Christian Brauner
@ 2025-05-26  7:18             ` Allison Karlitskaya
  2025-05-26  8:00               ` Allison Karlitskaya
  1 sibling, 1 reply; 11+ messages in thread
From: Allison Karlitskaya @ 2025-05-26  7:18 UTC (permalink / raw)
  To: Al Viro; +Cc: Christian Brauner, linux-fsdevel

good morning,


On Sat, 24 May 2025 at 01:22, Al Viro <viro@zeniv.linux.org.uk> wrote:
> Said that, could somebody (original reporter) confirm that the variant
> in git.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git #fixes (head at
> 63e90fcc1807) is OK with them?

I've tested the commit (and its parent) against my original usecase
that found the bug, along with the latest kernel in Fedora rawhide.
Here's the results:


broken:
Linux fedora 6.15.0-0.rc7.58.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Tue
May 20 14:10:49 UTC 2025 x86_64 GNU/Linux
(current kernel in rawhide)

broken:
d1ddc6f1d9f0 ("fix IS_MNT_PROPAGATING uses")
Linux fedora 6.15.0-rc5+ #8 SMP PREEMPT_DYNAMIC Mon May 26 09:14:09
CEST 2025 x86_64 GNU/Linux
(parent commit of the fix)

working:
63e90fcc1807 ("Don't propagate mounts into detached trees")
Linux fedora 6.15.0-rc5+ #7 SMP PREEMPT_DYNAMIC Mon May 26 09:12:43
CEST 2025 x86_64 GNU/Linux


tl;dr: Seems that the fix works as expected.

Thanks!

lis


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Apparent mount behaviour change in 6.15
  2025-05-26  7:18             ` Allison Karlitskaya
@ 2025-05-26  8:00               ` Allison Karlitskaya
  0 siblings, 0 replies; 11+ messages in thread
From: Allison Karlitskaya @ 2025-05-26  8:00 UTC (permalink / raw)
  To: Al Viro; +Cc: Christian Brauner, linux-fsdevel

On Mon, 26 May 2025 at 09:18, Allison Karlitskaya <lis@redhat.com> wrote:
> I've tested the commit (and its parent) against my original usecase
> that found the bug, along with the latest kernel in Fedora rawhide.
> Here's the results:
>
>
> broken:
> Linux fedora 6.15.0-0.rc7.58.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Tue
> May 20 14:10:49 UTC 2025 x86_64 GNU/Linux
> (current kernel in rawhide)
>
> broken:
> d1ddc6f1d9f0 ("fix IS_MNT_PROPAGATING uses")
> Linux fedora 6.15.0-rc5+ #8 SMP PREEMPT_DYNAMIC Mon May 26 09:14:09
> CEST 2025 x86_64 GNU/Linux
> (parent commit of the fix)
>
> working:
> 63e90fcc1807 ("Don't propagate mounts into detached trees")
> Linux fedora 6.15.0-rc5+ #7 SMP PREEMPT_DYNAMIC Mon May 26 09:12:43
> CEST 2025 x86_64 GNU/Linux

...and worryingly:
broken:
0ff41df1cb26 ("Linux 6.15")
Linux fedora 6.15.0 #9 SMP PREEMPT_DYNAMIC Mon May 26 09:51:47 CEST
2025 x86_64 GNU/Linux


ie: the release went out with the regression :(

lis


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Apparent mount behaviour change in 6.15
  2025-05-26  4:47             ` Christian Brauner
@ 2025-05-26 21:32               ` Al Viro
  0 siblings, 0 replies; 11+ messages in thread
From: Al Viro @ 2025-05-26 21:32 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Allison Karlitskaya, linux-fsdevel

On Mon, May 26, 2025 at 06:47:25AM +0200, Christian Brauner wrote:
> On Sat, May 24, 2025 at 12:22:13AM +0100, Al Viro wrote:
> > On Fri, May 23, 2025 at 10:37:35PM +0100, Al Viro wrote:
> > > On Fri, May 23, 2025 at 10:29:58PM +0100, Al Viro wrote:
> > > 
> > > > This is bogus, IMO.  I'm perfectly fine with propagate_one() returning 0
> > > > on anon_ns(m->mnt); that would refuse to propagate into *any* anon ns,
> > > > but won't screw the propagation between the mounts that are in normal, non-anon
> > > > namespaces.
> > > 
> > > IOW, I mean this variant - the only difference from what you've posted is
> > > the location of is_anon_ns() test; you do it in IS_MNT_NEW(), this variant
> > > has it in propagate_one().  Does the variant below fix regression?
> > 
> > AFAICS, it does suffice to revert the behaviour change on the reproducer
> > upthread.
> > 
> > I've replaced the top of viro/vfs.git#fixes with that; commit message there
> > is tentative - if nothing else, that's a patch from Christian with slight
> > modifications from me.  It also needs reported-by, etc.
> > 
> > Said that, could somebody (original reporter) confirm that the variant
> > in git.kernel.org:/pub/scm/linux/kernel/git/viro/vfs.git #fixes (head at
> > 63e90fcc1807) is OK with them?
> > 
> > And yes, it will need a proper commit message.  Christian?
> 
> Yes, that looks good to me, thank you!

OK, I went with the following for commit message:

-----
Don't propagate mounts into detached trees

All versions up to 6.14 did not propagate mount events into detached
tree.  Shortly after 6.14 a merge of vfs-6.15-rc1.mount.namespace
(130e696aa68b) has changed that.

Unfortunately, that has caused userland regressions (reported in
https://lore.kernel.org/all/CAOYeF9WQhFDe+BGW=Dp5fK8oRy5AgZ6zokVyTj1Wp4EUiYgt4w@mail.gmail.com/)

Straight revert wouldn't be an option - in particular, the variant in 6.14
had a bug that got fixed in d1ddc6f1d9f0 ("fix IS_MNT_PROPAGATING uses")
and we don't want to bring the bug back.

This is a modification of manual revert posted by Christian, with changes
needed to avoid reintroducing the breakage in scenario described in
d1ddc6f1d9f0.

Cc: stable@vger.kernel.org
Reported-by: Allison Karlitskaya <lis@redhat.com>
Tested-by: Allison Karlitskaya <lis@redhat.com>
Acked-by: Christian Brauner <brauner@kernel.org>
Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----

It's in viro/vfs.git #fixes; if everyone's OK with the commit message, I'm
sending a pull request tomorrow.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-05-26 21:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-15  4:06 Apparent mount behaviour change in 6.15 Allison Karlitskaya
2025-05-15 11:25 ` Christian Brauner
2025-05-23  6:32   ` Al Viro
2025-05-23  8:41     ` Christian Brauner
2025-05-23 21:29       ` Al Viro
2025-05-23 21:37         ` Al Viro
2025-05-23 23:22           ` Al Viro
2025-05-26  4:47             ` Christian Brauner
2025-05-26 21:32               ` Al Viro
2025-05-26  7:18             ` Allison Karlitskaya
2025-05-26  8:00               ` Allison Karlitskaya

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).