linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH][RFC] ->mnt_devname is never NULL
Date: Tue, 22 Apr 2025 13:25:14 +0100	[thread overview]
Message-ID: <20250422122514.GZ2023217@ZenIV> (raw)
In-Reply-To: <20250422-erbeten-ambiente-f6b13eab8a29@brauner>

On Tue, Apr 22, 2025 at 09:31:14AM +0200, Christian Brauner wrote:
> On Mon, Apr 21, 2025 at 05:29:47PM +0100, Al Viro wrote:
> > On Mon, Apr 21, 2025 at 09:56:20AM +0200, Christian Brauner wrote:
> > > On Mon, Apr 21, 2025 at 04:35:09AM +0100, Al Viro wrote:
> > > > Not since 8f2918898eb5 "new helpers: vfs_create_mount(), fc_mount()"
> > > > back in 2018.  Get rid of the dead checks...
> > > >     
> > > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > > ---
> > > 
> > > Good idea. Fwiw, I've put this into vfs-6.16.mount with some other minor
> > > stuff. If you're keeping it yourself let me know.
> > 
> > Not sure...  I'm going through documenting the struct mount lifecycle/locking/etc.
> > and it already looks like there will be more patches, but then some are going
> > to be #fixes fodder.
> > 
> > Example caught just a couple of minutes ago: do_lock_mount()
> >                 if (beneath) {
> >                         m = real_mount(mnt);
> >                         read_seqlock_excl(&mount_lock);
> >                         dentry = dget(m->mnt_mountpoint);
> >                         read_sequnlock_excl(&mount_lock);
> >                 } else {
> >                         dentry = path->dentry;
> >                 }
> > 
> >                 inode_lock(dentry->d_inode);
> > What's to prevent the 'beneath' case from getting mnt mount --move'd
> > away *AND* the ex-parent from getting unmounted while we are blocked
> > in inode_lock?  At this point we are not holding any locks whatsoever
> > (and all mount-related locks nest inside inode_lock(), so we couldn't
> > hold them there anyway).
> > 
> > Hit that race and watch a very unhappy umount...
> 
> If it gets unmounted or moved we immediately detect this in the next line:
> 
> if (beneath && (!is_mounted(mnt) || m->mnt_mountpoint != dentry)) {

Sure, we would - *AFTER* we get through that inode_lock().

Consider the following setup:

mkdir foo
mkdir bar
mkdir splat
mount -t tmpfs none foo		# mount 1
mount -t tmpfs none bar		# mount 2
mkdir bar/baz
mount -t tmpfs none bar/baz	# mount 3

then

A: move_mount(AT_FDCWD, "foo", AT_FDCWD, "bar/baz", MOVE_MOUNT_BENEATH)
gets to do_move_mount() and into do_lock_mount() called by it.

path->mnt points to mount 3, path->dentry - to its root.  Both are pinned.
do_lock_mount() goes into the first iteration of loop.  beneath is true,
so it picks dentry - that of #3 mountpoint, i.e. "/baz" on #2 tmpfs instance.

At that point refcount of that dentry is 3 - one from being a positive on
tmpfs, one from being a mountpoint and one more just grabbed by do_lock_mount().

Now we enter inode_lock(dentry->d_inode).  Note that at that point A is not
holding any locks.  Suppose it gets preempted at this moment for whatever reason.

B: mount --move bar/baz splat
Proceeds without any problems, mount #3 gets moved to "splat".  Now refcount
of mount #2 is not pinned by anything and refcount of "/baz" on it is 2, since
it's no longer a mountpoint.

B: umount bar
... and now it hits the fan, since the refcount of mount #2 is not elevated by
anything, so we do not hit -EBUSY and proceed through umount(2) all the way to
kill_litter_super(), which drops the refcount of "/baz" to 1 and calls kill_anon_super().
Which gets to shrink_dcache_for_umount() and from there - to umount_check() on
that dentry.  You get yelled at, then you get yelled at again for busy inodes
after umount (that dentry is pinning the inode down), etc.  Superblock of #2
is freed.

A: regains CPU.  is_mounted() is true (now at splat instead of bar/baz, but still
mounted), ->mnt_mountpoint does not match.
All right, inode_unlock(dentry->d_inode), then dput(dentry) and now the refcount
of that dentry finally hits zero.  We get iput() on its inode, followed by
shmem_evict_inode() which is where we finally oops.

As for the second issue...  Normal callers of unlock_mount() do have a struct path
somewhere that pins the location we are dealing with.  However, 'beneath' case
of do_move_mount() does not - it relies upon the sucker being a mountpoint all
along.  Which is fine until you drop namespace_sem.  As soon as namespace_unlock()
has been called, there's no warranty that it will _stay_ a mountpoint.  Moving
that inode_unlock() before the namespace_unlock() avoids that scenario.

  reply	other threads:[~2025-04-22 12:25 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-21  3:35 [PATCH][RFC] ->mnt_devname is never NULL Al Viro
2025-04-21  7:56 ` Christian Brauner
2025-04-21 16:29   ` Al Viro
2025-04-21 17:03     ` Al Viro
2025-04-22  3:14       ` [PATCH][RFC] do_lock_mount() races in 'beneath' case Al Viro
2025-04-22  7:47         ` Christian Brauner
2025-04-22  7:43       ` [PATCH][RFC] ->mnt_devname is never NULL Christian Brauner
2025-04-22  7:31     ` Christian Brauner
2025-04-22 12:25       ` Al Viro [this message]
2025-04-22 13:40         ` Christian Brauner
2025-04-23  1:30         ` Al Viro
2025-04-23 22:20     ` Al Viro
2025-04-24  8:56       ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250422122514.GZ2023217@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=brauner@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).