linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] ceph: strange mount/unmount behavior
@ 2025-08-25 21:53 Viacheslav Dubeyko
  2025-08-26  9:10 ` Christian Brauner
  0 siblings, 1 reply; 3+ messages in thread
From: Viacheslav Dubeyko @ 2025-08-25 21:53 UTC (permalink / raw)
  To: linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk,
	brauner@kernel.org
  Cc: ceph-devel@vger.kernel.org, idryomov@gmail.com, Patrick Donnelly,
	Alex Markuze, Pavan Rallabhandi, Greg Farnum

Hello,

I am investigating an issue with generic/604:

sudo ./check generic/604
FSTYP         -- ceph
PLATFORM      -- Linux/x86_64 ceph-0005 6.17.0-rc1+ #29 SMP PREEMPT_DYNAMIC Mon
Aug 25 13:06:10 PDT 2025
MKFS_OPTIONS  -- 192.168.1.213:6789:/scratch
MOUNT_OPTIONS -- -o name=admin 192.168.1.213:6789:/scratch /mnt/cephfs/scratch

generic/604 10s ... - output mismatch (see
XFSTESTS/xfstestsdev/results//generic/604.out.bad)
    --- tests/generic/604.out	2025-02-25 13:05:32.515668548 -0800
    +++ XFSTESTS/xfstests-dev/results//generic/604.out.bad	2025-08-25
14:25:49.256780397 -0700
    @@ -1,2 +1,3 @@
     QA output created by 604
    +umount: /mnt/cephfs/scratch: target is busy.
     Silence is golden
    ...
    (Run 'diff -u XFSTESTS/xfstests-dev/tests/generic/604.out XFSTESTS/xfstests-
dev/results//generic/604.out.bad'  to see the entire diff)
Ran: generic/604
Failures: generic/604
Failed 1 of 1 tests

As far as I can see, the generic/604 intentionally delays the unmount and mount
operation starts before unmount finish:

# For overlayfs, avoid unmounting the base fs after _scratch_mount tries to
# mount the base fs.  Delay the mount attempt by a small amount in the hope
# that the mount() call will try to lock s_umount /after/ umount has already
# taken it.
$UMOUNT_PROG $SCRATCH_MNT &
sleep 0.01s ; _scratch_mount
wait

As a result, we have this issue because a mnt_count is bigger than expected one
in propagate_mount_busy() [1]:

	} else {
		smp_mb(); // paired with __legitimize_mnt()
		shrink_submounts(mnt);
		retval = -EBUSY;
		if (!propagate_mount_busy(mnt, 2)) {
			umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC);
			retval = 0;
		}
	}


[   71.347372] pid 3762 do_umount():2022 finished:  mnt_get_count(mnt) 3

But if I am trying to understand what is going on during mount, then I can see
that I can mount the same file system instance multiple times even for the same
mount point:

192.168.1.195:6789,192.168.1.212:6789,192.168.1.213:6789:/ on /mnt/cephfs type
ceph (rw,relatime,name=admin,secret=<hidden>,fsid=31977b06-8cdb-42a9-97ad-
d6a7d59a42dd,acl,mds_namespace=cephfs)
192.168.1.195:6789,192.168.1.212:6789,192.168.1.213:6789:/ on /mnt/TestCephFS
type ceph (rw,relatime,name=admin,secret=<hidden>,fsid=31977b06-8cdb-42a9-97ad-
d6a7d59a42dd,acl,mds_namespace=cephfs)
192.168.1.195:6789,192.168.1.212:6789,192.168.1.213:6789:/ on /mnt/cephfs type
ceph (rw,relatime,name=admin,secret=<hidden>,fsid=31977b06-8cdb-42a9-97ad-
d6a7d59a42dd,acl,mds_namespace=cephfs)
192.168.1.195:6789,192.168.1.212:6789,192.168.1.213:6789:/ on /mnt/cephfs type
ceph (rw,relatime,name=admin,secret=<hidden>,fsid=31977b06-8cdb-42a9-97ad-
d6a7d59a42dd,acl,mds_namespace=cephfs)
192.168.1.195:6789,192.168.1.212:6789,192.168.1.213:6789:/ on /mnt/cephfs type
ceph (rw,relatime,name=admin,secret=<hidden>,fsid=31977b06-8cdb-42a9-97ad-
d6a7d59a42dd,acl,mds_namespace=cephfs)

And it looks really confusing to me. OK, let's imagine that mounting the same
file system instance into different folders (for example, /mnt/TestCephFS and
/mnt/cephfs) could make sense. However, I am not sure that it is correct
behavior. But mounting the same file system instance into the same folder
doesn't make sense to me. Maybe, I am missing something important here.

Am I correct here? Is it expected behavior? I assume that CephFS has incorrect
mount logic that creates the issue during umount operation? Any thoughts?

Thanks,
Slava.

[1] https://elixir.bootlin.com/linux/v6.17-rc1/source/fs/namespace.c#L2002

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] ceph: strange mount/unmount behavior
  2025-08-25 21:53 [RFC] ceph: strange mount/unmount behavior Viacheslav Dubeyko
@ 2025-08-26  9:10 ` Christian Brauner
  2025-08-26 18:58   ` Viacheslav Dubeyko
  0 siblings, 1 reply; 3+ messages in thread
From: Christian Brauner @ 2025-08-26  9:10 UTC (permalink / raw)
  To: Viacheslav Dubeyko
  Cc: linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk,
	ceph-devel@vger.kernel.org, idryomov@gmail.com, Patrick Donnelly,
	Alex Markuze, Pavan Rallabhandi, Greg Farnum

On Mon, Aug 25, 2025 at 09:53:48PM +0000, Viacheslav Dubeyko wrote:
> Hello,
> 
> I am investigating an issue with generic/604:
> 
> sudo ./check generic/604
> FSTYP         -- ceph
> PLATFORM      -- Linux/x86_64 ceph-0005 6.17.0-rc1+ #29 SMP PREEMPT_DYNAMIC Mon
> Aug 25 13:06:10 PDT 2025
> MKFS_OPTIONS  -- 192.168.1.213:6789:/scratch
> MOUNT_OPTIONS -- -o name=admin 192.168.1.213:6789:/scratch /mnt/cephfs/scratch
> 
> generic/604 10s ... - output mismatch (see
> XFSTESTS/xfstestsdev/results//generic/604.out.bad)
>     --- tests/generic/604.out	2025-02-25 13:05:32.515668548 -0800
>     +++ XFSTESTS/xfstests-dev/results//generic/604.out.bad	2025-08-25
> 14:25:49.256780397 -0700
>     @@ -1,2 +1,3 @@
>      QA output created by 604
>     +umount: /mnt/cephfs/scratch: target is busy.
>      Silence is golden
>     ...
>     (Run 'diff -u XFSTESTS/xfstests-dev/tests/generic/604.out XFSTESTS/xfstests-
> dev/results//generic/604.out.bad'  to see the entire diff)
> Ran: generic/604
> Failures: generic/604
> Failed 1 of 1 tests
> 
> As far as I can see, the generic/604 intentionally delays the unmount and mount
> operation starts before unmount finish:
> 
> # For overlayfs, avoid unmounting the base fs after _scratch_mount tries to
> # mount the base fs.  Delay the mount attempt by a small amount in the hope
> # that the mount() call will try to lock s_umount /after/ umount has already
> # taken it.
> $UMOUNT_PROG $SCRATCH_MNT &
> sleep 0.01s ; _scratch_mount
> wait
> 
> As a result, we have this issue because a mnt_count is bigger than expected one
> in propagate_mount_busy() [1]:
> 
> 	} else {
> 		smp_mb(); // paired with __legitimize_mnt()
> 		shrink_submounts(mnt);
> 		retval = -EBUSY;
> 		if (!propagate_mount_busy(mnt, 2)) {
> 			umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC);
> 			retval = 0;
> 		}
> 	}
> 
> 
> [   71.347372] pid 3762 do_umount():2022 finished:  mnt_get_count(mnt) 3
> 
> But if I am trying to understand what is going on during mount, then I can see
> that I can mount the same file system instance multiple times even for the same
> mount point:

The new mount api has always allowed for this whereas the old mount(2)
api doesn't. There's no reason to not allow this.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [RFC] ceph: strange mount/unmount behavior
  2025-08-26  9:10 ` Christian Brauner
@ 2025-08-26 18:58   ` Viacheslav Dubeyko
  0 siblings, 0 replies; 3+ messages in thread
From: Viacheslav Dubeyko @ 2025-08-26 18:58 UTC (permalink / raw)
  To: brauner@kernel.org
  Cc: linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org,
	Patrick Donnelly, Alex Markuze, Greg Farnum, idryomov@gmail.com,
	Pavan Rallabhandi, viro@zeniv.linux.org.uk

On Tue, 2025-08-26 at 11:10 +0200, Christian Brauner wrote:
> On Mon, Aug 25, 2025 at 09:53:48PM +0000, Viacheslav Dubeyko wrote:
> > Hello,
> > 
> > I am investigating an issue with generic/604:
> > 
> > sudo ./check generic/604
> > FSTYP         -- ceph
> > PLATFORM      -- Linux/x86_64 ceph-0005 6.17.0-rc1+ #29 SMP PREEMPT_DYNAMIC Mon
> > Aug 25 13:06:10 PDT 2025
> > MKFS_OPTIONS  -- 192.168.1.213:6789:/scratch
> > MOUNT_OPTIONS -- -o name=admin 192.168.1.213:6789:/scratch /mnt/cephfs/scratch
> > 
> > generic/604 10s ... - output mismatch (see
> > XFSTESTS/xfstestsdev/results//generic/604.out.bad)
> >     --- tests/generic/604.out	2025-02-25 13:05:32.515668548 -0800
> >     +++ XFSTESTS/xfstests-dev/results//generic/604.out.bad	2025-08-25
> > 14:25:49.256780397 -0700
> >     @@ -1,2 +1,3 @@
> >      QA output created by 604
> >     +umount: /mnt/cephfs/scratch: target is busy.
> >      Silence is golden
> >     ...
> >     (Run 'diff -u XFSTESTS/xfstests-dev/tests/generic/604.out XFSTESTS/xfstests-
> > dev/results//generic/604.out.bad'  to see the entire diff)
> > Ran: generic/604
> > Failures: generic/604
> > Failed 1 of 1 tests
> > 
> > As far as I can see, the generic/604 intentionally delays the unmount and mount
> > operation starts before unmount finish:
> > 
> > # For overlayfs, avoid unmounting the base fs after _scratch_mount tries to
> > # mount the base fs.  Delay the mount attempt by a small amount in the hope
> > # that the mount() call will try to lock s_umount /after/ umount has already
> > # taken it.
> > $UMOUNT_PROG $SCRATCH_MNT &
> > sleep 0.01s ; _scratch_mount
> > wait
> > 
> > As a result, we have this issue because a mnt_count is bigger than expected one
> > in propagate_mount_busy() [1]:
> > 
> > 	} else {
> > 		smp_mb(); // paired with __legitimize_mnt()
> > 		shrink_submounts(mnt);
> > 		retval = -EBUSY;
> > 		if (!propagate_mount_busy(mnt, 2)) {
> > 			umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC);
> > 			retval = 0;
> > 		}
> > 	}
> > 
> > 
> > [   71.347372] pid 3762 do_umount():2022 finished:  mnt_get_count(mnt) 3
> > 
> > But if I am trying to understand what is going on during mount, then I can see
> > that I can mount the same file system instance multiple times even for the same
> > mount point:
> 
> The new mount api has always allowed for this whereas the old mount(2)
> api doesn't. There's no reason to not allow this.

OK. I see.

So, finally, the main problem of current generic/604 issue is not correct
interaction of mount and unmount logic of CephFS code. Somehow, the mount logic
makes the mnt_count is bigger than expected one for unmount path. What could be
a clean/correct solution of this issue from the VFS point of view?

Thanks,
Slava.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-08-26 18:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-25 21:53 [RFC] ceph: strange mount/unmount behavior Viacheslav Dubeyko
2025-08-26  9:10 ` Christian Brauner
2025-08-26 18:58   ` Viacheslav Dubeyko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).