From: Al Viro <viro@zeniv.linux.org.uk>
To: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH][RFC] ->mnt_devname is never NULL
Date: Tue, 22 Apr 2025 13:25:14 +0100 [thread overview]
Message-ID: <20250422122514.GZ2023217@ZenIV> (raw)
In-Reply-To: <20250422-erbeten-ambiente-f6b13eab8a29@brauner>
On Tue, Apr 22, 2025 at 09:31:14AM +0200, Christian Brauner wrote:
> On Mon, Apr 21, 2025 at 05:29:47PM +0100, Al Viro wrote:
> > On Mon, Apr 21, 2025 at 09:56:20AM +0200, Christian Brauner wrote:
> > > On Mon, Apr 21, 2025 at 04:35:09AM +0100, Al Viro wrote:
> > > > Not since 8f2918898eb5 "new helpers: vfs_create_mount(), fc_mount()"
> > > > back in 2018. Get rid of the dead checks...
> > > >
> > > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > > ---
> > >
> > > Good idea. Fwiw, I've put this into vfs-6.16.mount with some other minor
> > > stuff. If you're keeping it yourself let me know.
> >
> > Not sure... I'm going through documenting the struct mount lifecycle/locking/etc.
> > and it already looks like there will be more patches, but then some are going
> > to be #fixes fodder.
> >
> > Example caught just a couple of minutes ago: do_lock_mount()
> > if (beneath) {
> > m = real_mount(mnt);
> > read_seqlock_excl(&mount_lock);
> > dentry = dget(m->mnt_mountpoint);
> > read_sequnlock_excl(&mount_lock);
> > } else {
> > dentry = path->dentry;
> > }
> >
> > inode_lock(dentry->d_inode);
> > What's to prevent the 'beneath' case from getting mnt mount --move'd
> > away *AND* the ex-parent from getting unmounted while we are blocked
> > in inode_lock? At this point we are not holding any locks whatsoever
> > (and all mount-related locks nest inside inode_lock(), so we couldn't
> > hold them there anyway).
> >
> > Hit that race and watch a very unhappy umount...
>
> If it gets unmounted or moved we immediately detect this in the next line:
>
> if (beneath && (!is_mounted(mnt) || m->mnt_mountpoint != dentry)) {
Sure, we would - *AFTER* we get through that inode_lock().
Consider the following setup:
mkdir foo
mkdir bar
mkdir splat
mount -t tmpfs none foo # mount 1
mount -t tmpfs none bar # mount 2
mkdir bar/baz
mount -t tmpfs none bar/baz # mount 3
then
A: move_mount(AT_FDCWD, "foo", AT_FDCWD, "bar/baz", MOVE_MOUNT_BENEATH)
gets to do_move_mount() and into do_lock_mount() called by it.
path->mnt points to mount 3, path->dentry - to its root. Both are pinned.
do_lock_mount() goes into the first iteration of loop. beneath is true,
so it picks dentry - that of #3 mountpoint, i.e. "/baz" on #2 tmpfs instance.
At that point refcount of that dentry is 3 - one from being a positive on
tmpfs, one from being a mountpoint and one more just grabbed by do_lock_mount().
Now we enter inode_lock(dentry->d_inode). Note that at that point A is not
holding any locks. Suppose it gets preempted at this moment for whatever reason.
B: mount --move bar/baz splat
Proceeds without any problems, mount #3 gets moved to "splat". Now refcount
of mount #2 is not pinned by anything and refcount of "/baz" on it is 2, since
it's no longer a mountpoint.
B: umount bar
... and now it hits the fan, since the refcount of mount #2 is not elevated by
anything, so we do not hit -EBUSY and proceed through umount(2) all the way to
kill_litter_super(), which drops the refcount of "/baz" to 1 and calls kill_anon_super().
Which gets to shrink_dcache_for_umount() and from there - to umount_check() on
that dentry. You get yelled at, then you get yelled at again for busy inodes
after umount (that dentry is pinning the inode down), etc. Superblock of #2
is freed.
A: regains CPU. is_mounted() is true (now at splat instead of bar/baz, but still
mounted), ->mnt_mountpoint does not match.
All right, inode_unlock(dentry->d_inode), then dput(dentry) and now the refcount
of that dentry finally hits zero. We get iput() on its inode, followed by
shmem_evict_inode() which is where we finally oops.
As for the second issue... Normal callers of unlock_mount() do have a struct path
somewhere that pins the location we are dealing with. However, 'beneath' case
of do_move_mount() does not - it relies upon the sucker being a mountpoint all
along. Which is fine until you drop namespace_sem. As soon as namespace_unlock()
has been called, there's no warranty that it will _stay_ a mountpoint. Moving
that inode_unlock() before the namespace_unlock() avoids that scenario.
next prev parent reply other threads:[~2025-04-22 12:25 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-21 3:35 [PATCH][RFC] ->mnt_devname is never NULL Al Viro
2025-04-21 7:56 ` Christian Brauner
2025-04-21 16:29 ` Al Viro
2025-04-21 17:03 ` Al Viro
2025-04-22 3:14 ` [PATCH][RFC] do_lock_mount() races in 'beneath' case Al Viro
2025-04-22 7:47 ` Christian Brauner
2025-04-22 7:43 ` [PATCH][RFC] ->mnt_devname is never NULL Christian Brauner
2025-04-22 7:31 ` Christian Brauner
2025-04-22 12:25 ` Al Viro [this message]
2025-04-22 13:40 ` Christian Brauner
2025-04-23 1:30 ` Al Viro
2025-04-23 22:20 ` Al Viro
2025-04-24 8:56 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250422122514.GZ2023217@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=brauner@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.