From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [62.89.141.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 14B461F09B3 for ; Tue, 22 Apr 2025 12:25:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=62.89.141.173 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745324718; cv=none; b=pj9Hu/5LQuvmzeQcU9OGwhYBMhbYyDM5WVzxB+iqWg40v96HTMiVAfU+VwBsOZ6uu+0DSJZZ5a4j82o92/FxgwrTPm9dKlz0uffLOa9OWrkSagbJHrq+sBp/WzuuLz+hC9pmXcNVLOWfWVOkPJKWY5F8ZqtYLLuzskDwrDVTN/U= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745324718; c=relaxed/simple; bh=CWoyTh+Q/6W8Nv8gddun+uEr0UT8coD1FmH1Wj/0rko=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=UOEGSlTTwN925eYNDbmR9V850fGNvqyirbgBachraz8Vn/w2hatgGEk1Sn7dzg3TCJe96pXpBbNjuFabRX6UXVBEdmQPR7Fx8Da1PICl+sH2U1JIL6E3PUzIykIHxlbLBoygis+YRbwOvnW7b6ALIsk6fW3UMrqvdLUgVx+NdRs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zeniv.linux.org.uk; spf=none smtp.mailfrom=ftp.linux.org.uk; dkim=pass (2048-bit key) header.d=linux.org.uk header.i=@linux.org.uk header.b=A3buqNfp; arc=none smtp.client-ip=62.89.141.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=zeniv.linux.org.uk Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=ftp.linux.org.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linux.org.uk header.i=@linux.org.uk header.b="A3buqNfp" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=YtHQfcU0ujBCHdSccsLsNycAHDtNC5m4+Ufos9psphw=; b=A3buqNfpqr91yH4pCEoe04vU7q uu79vgiWtkqRMYnzOj+3uFz/HQZUAm/CN/WQvuFrrR8q410PjCq3VnOUol8pNOvwOEDp1nCjdRzsk H1Je5VIBELlbWxcWSCdQuVTK8Ay+Ee/E5lPhPp4dHUhLlXz1M32/7D4LP0OrEVlPXaOIcTSQBQ3u9 MXBY8zFHQOjWRMtbap3PPSPX7M6hdESi/bxcjzttcscneyt4hibK4V2Rjh0Fhf43N1c0sod6StQCc Iiej39MQfsgUzA4bAc51HCyJAg9zRm0IX0IHclObpbXVRwZ+qQsQCsCgbzVxFAPY9YVM80g13up3N vPj5ZNGQ==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.98.2 #2 (Red Hat Linux)) id 1u7Cgc-00000005tqA-141B; Tue, 22 Apr 2025 12:25:14 +0000 Date: Tue, 22 Apr 2025 13:25:14 +0100 From: Al Viro To: Christian Brauner Cc: linux-fsdevel@vger.kernel.org Subject: Re: [PATCH][RFC] ->mnt_devname is never NULL Message-ID: <20250422122514.GZ2023217@ZenIV> References: <20250421033509.GV2023217@ZenIV> <20250421-annehmbar-fotoband-eb32f31f6124@brauner> <20250421162947.GW2023217@ZenIV> <20250422-erbeten-ambiente-f6b13eab8a29@brauner> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250422-erbeten-ambiente-f6b13eab8a29@brauner> Sender: Al Viro On Tue, Apr 22, 2025 at 09:31:14AM +0200, Christian Brauner wrote: > On Mon, Apr 21, 2025 at 05:29:47PM +0100, Al Viro wrote: > > On Mon, Apr 21, 2025 at 09:56:20AM +0200, Christian Brauner wrote: > > > On Mon, Apr 21, 2025 at 04:35:09AM +0100, Al Viro wrote: > > > > Not since 8f2918898eb5 "new helpers: vfs_create_mount(), fc_mount()" > > > > back in 2018. Get rid of the dead checks... > > > > > > > > Signed-off-by: Al Viro > > > > --- > > > > > > Good idea. Fwiw, I've put this into vfs-6.16.mount with some other minor > > > stuff. If you're keeping it yourself let me know. > > > > Not sure... I'm going through documenting the struct mount lifecycle/locking/etc. > > and it already looks like there will be more patches, but then some are going > > to be #fixes fodder. > > > > Example caught just a couple of minutes ago: do_lock_mount() > > if (beneath) { > > m = real_mount(mnt); > > read_seqlock_excl(&mount_lock); > > dentry = dget(m->mnt_mountpoint); > > read_sequnlock_excl(&mount_lock); > > } else { > > dentry = path->dentry; > > } > > > > inode_lock(dentry->d_inode); > > What's to prevent the 'beneath' case from getting mnt mount --move'd > > away *AND* the ex-parent from getting unmounted while we are blocked > > in inode_lock? At this point we are not holding any locks whatsoever > > (and all mount-related locks nest inside inode_lock(), so we couldn't > > hold them there anyway). > > > > Hit that race and watch a very unhappy umount... > > If it gets unmounted or moved we immediately detect this in the next line: > > if (beneath && (!is_mounted(mnt) || m->mnt_mountpoint != dentry)) { Sure, we would - *AFTER* we get through that inode_lock(). Consider the following setup: mkdir foo mkdir bar mkdir splat mount -t tmpfs none foo # mount 1 mount -t tmpfs none bar # mount 2 mkdir bar/baz mount -t tmpfs none bar/baz # mount 3 then A: move_mount(AT_FDCWD, "foo", AT_FDCWD, "bar/baz", MOVE_MOUNT_BENEATH) gets to do_move_mount() and into do_lock_mount() called by it. path->mnt points to mount 3, path->dentry - to its root. Both are pinned. do_lock_mount() goes into the first iteration of loop. beneath is true, so it picks dentry - that of #3 mountpoint, i.e. "/baz" on #2 tmpfs instance. At that point refcount of that dentry is 3 - one from being a positive on tmpfs, one from being a mountpoint and one more just grabbed by do_lock_mount(). Now we enter inode_lock(dentry->d_inode). Note that at that point A is not holding any locks. Suppose it gets preempted at this moment for whatever reason. B: mount --move bar/baz splat Proceeds without any problems, mount #3 gets moved to "splat". Now refcount of mount #2 is not pinned by anything and refcount of "/baz" on it is 2, since it's no longer a mountpoint. B: umount bar ... and now it hits the fan, since the refcount of mount #2 is not elevated by anything, so we do not hit -EBUSY and proceed through umount(2) all the way to kill_litter_super(), which drops the refcount of "/baz" to 1 and calls kill_anon_super(). Which gets to shrink_dcache_for_umount() and from there - to umount_check() on that dentry. You get yelled at, then you get yelled at again for busy inodes after umount (that dentry is pinning the inode down), etc. Superblock of #2 is freed. A: regains CPU. is_mounted() is true (now at splat instead of bar/baz, but still mounted), ->mnt_mountpoint does not match. All right, inode_unlock(dentry->d_inode), then dput(dentry) and now the refcount of that dentry finally hits zero. We get iput() on its inode, followed by shmem_evict_inode() which is where we finally oops. As for the second issue... Normal callers of unlock_mount() do have a struct path somewhere that pins the location we are dealing with. However, 'beneath' case of do_move_mount() does not - it relies upon the sucker being a mountpoint all along. Which is fine until you drop namespace_sem. As soon as namespace_unlock() has been called, there's no warranty that it will _stay_ a mountpoint. Moving that inode_unlock() before the namespace_unlock() avoids that scenario.