From: Christian Brauner <brauner@kernel.org>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: tytso@mit.edu, Miklos Szeredi <miklos@szeredi.hu>,
linux-f2fs-devel@lists.sourceforge.net, ebiggers@kernel.org,
"Eric W. Biederman" <ebiederm@xmission.com>,
linux-fsdevel@vger.kernel.org, jaegeuk@kernel.org,
linux-ext4@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Gabriel Krisman Bertazi <gabriel@krisman.be>
Subject: Re: [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was Re: [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs
Date: Mon, 27 Nov 2023 17:33:36 +0100 [thread overview]
Message-ID: <20231127-lahmlegen-redeverbot-43de4b46063c@brauner> (raw)
In-Reply-To: <20231127063842.GG38156@ZenIV>
On Mon, Nov 27, 2023 at 06:38:42AM +0000, Al Viro wrote:
> On Sun, Nov 26, 2023 at 06:41:41PM +0000, Al Viro wrote:
>
> > d_invalidate() situation is more subtle - we need to sort out its interplay
> > with d_splice_alias().
> >
> > More concise variant of the scenario in question:
> > * we have /mnt/foo/bar and a lot of its descendents in dcache on client
> > * server does a rename, after which what used to be /mnt/foo/bar is /mnt/foo/baz
> > * somebody on the client does a lookup of /mnt/foo/bar and gets told by
> > the server that there's no directory with that name anymore.
> > * that somebody hits d_invalidate(), unhashes /mnt/foo/bar and starts
> > evicting its descendents
> > * We try to mount something on /mnt/foo/baz/blah. We look up baz, get
> > an fhandle and notice that there's a directory inode for it (/mnt/foo/bar).
> > d_splice_alias() picks the bugger and moves it to /mnt/foo/baz, rehashing
> > it in process, as it ought to. Then we find /mnt/foo/baz/blah in dcache and
> > mount on top of it.
> > * d_invalidate() finishes shrink_dcache_parent() and starts hunting for
> > submounts to dissolve. And finds the mount we'd done. Which mount quietly
> > disappears.
> >
> > Note that from the server POV the thing had been moved quite a while ago.
> > No server-side races involved - all it seeem is a couple of LOOKUP in the
> > same directory, one for the old name, one for the new.
> >
> > On the client on the mounter side we have an uneventful mount on /mnt/foo/baz,
> > which had been there on server at the time we started and which remains in
> > place after mount we'd created suddenly disappears.
> >
> > For the thread that ended up calling d_invalidate(), they'd been doing e.g.
> > stat on a pathname that used to be there a while ago, but currently isn't.
> > They get -ENOENT and no indication that something odd might have happened.
> >
> > >From ->d_revalidate() point of view there's also nothing odd happening -
> > dentry is not a mountpoint, it stays in place until we return and there's
> > no directory entry with that name on in its parent. It's as clear-cut
> > as it gets - dentry is stale.
> >
> > The only overlap happening there is d_splice_alias() hitting in the middle
> > of already started d_invalidate().
> >
> > For a while I thought that ff17fa561a04 "d_invalidate(): unhash immediately"
> > and 3a8e3611e0ba "d_walk(): kill 'finish' callback" might have something
> > to do with it, but the same problem existed prior to that.
> >
> > FWIW, I suspect that the right answer would be along the lines of
> > * if d_splice_alias() does move an exsiting (attached) alias in
> > place, it ought to dissolve all mountpoints in subtree being moved.
> > There might be subtleties, but in case when that __d_unalias() happens
> > due to rename on server this is definitely the right thing to do.
> > * d_invalidate() should *NOT* do anything with dentry that
> > got moved (including moved by d_splice_alias()) from the place we'd
> > found it in dcache. At least d_invalidate() done due to having
> > ->d_revalidate() return 0.
> > * d_invalidate() should dissolve all mountpoints in the
> > subtree that existed when it got started (and found the victim
> > still unmoved, that is). It should (as it does) prevent any
> > new mountpoints added in that subtree, unless the mountpoint
> > to be had been moved (spliced) out. What it really shouldn't
> > do is touch the mountpoints that are currently outside of it
> > due to moves.
> >
> > I'm going to look around and see if we have any weird cases where
> > d_splice_alias() is used for things like "correct the case of
> > dentry name on a case-mangled filesystem" - that would presumably
> > not want to dissolve any submounts. I seem to recall seeing
> > some shite of that sort, but that was a long time ago.
> >
> > Eric, Miklos - it might be a good idea if you at least took a
> > look at whatever comes out of that (sub)thread; I'm trying to
> > reconstruct the picture, but the last round of serious reworking
> > of that area had been almost 10 years ago and your recollections
> > of the considerations back then might help. I realize that they
> > are probably rather fragmentary (mine definitely are) and any
> > analysis will need to be redone on the current tree, but...
>
> TBH, I wonder if we ought to have d_invalidate() variant that would
> unhash the dentry in question, do a variant of shrink_dcache_parent()
> that would report if there had been any mountpoints and if there
> had been any, do namespace_lock() and go hunting for mounts in that
> subtree, moving corresponding struct mountpoint to a private list
> as we go (removing them from mountpoint hash chains, that it). Then
> have them all evicted after we'd finished walking the subtree...
That sounds reasonable.
>
> The tricky part will be lock ordering - right now we have the
> mountpoint hash protected by mount_lock (same as mount hash, probably
> worth splitting anyway) and that nests outside of ->d_lock.
>
> Note that we don't do mountpoint hash lookups on mountpoint crossing
> - it's nowhere near the really hot paths. What we have is
> lookup_mountpoint() - plain hash lookup. Always
> under namespace_lock() and mount_lock.
> get_mountpoint() - there's an insertion into hash chain,
> with dentry passed through the d_set_mounted(), which would
> fail if we have d_invalidate() on the subtree.
> Also always under namespace_lock() and mount_lock.
> __put_mountpoint() - removal from the hash chain.
> We remove from hash chain after having cleared DCACHE_MOUNTED.
> _That_ can happen under mount_lock alone (taking out the stuck
> submounts on final mntput()).
>
> So convert the mountpoint hash chains to hlist_bl, bitlocks nesting under
> ->d_lock. Introduce a new dentry flag (DCHACE_MOUNT_INVALIDATION?)
> In d_walk() callback we would
> * do nothing if DCACHE_MOUNT is not set or DCACHE_MOUNT_INVALIDATION
> is.
> * otherwise set DCACHE_MOUNT_INVALIDATION, grab the bitlock on the
> mountpoint hash chain matching that dentry, find struct mountpoint in it,
> remove it from the chain and insert into a separate "collector" chain, all
> without messing with refcount.
Ok.
> In lookup_mountpoint() and get_mountpoint() take the bitlock on chain.
> In __put_mountpoint(), once it has grabbed ->d_lock
> * check if it has DCACHE_MOUNT_INVALIDATION, use that to
> decide which chain we are locking - the normal one or the collector
> * clear both DCACHE_MOUNT and DCACHE_MOUNT_INVALIDATION
> * remove from chain
> * unlock the chain
> * drop ->d_lock.
>
> Once we are finished walking the tree, go over the collector list
> and do what __detach_mount() guts do. We are no longer under
> any ->d_lock, so locking is not a problem. namespace_unlock() will
> flush them all, same as it does for __detach_mount().
Ok.
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
next prev parent reply other threads:[~2023-11-27 16:33 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-16 5:07 [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs Gabriel Krisman Bertazi
2023-08-16 5:07 ` [f2fs-dev] [PATCH v6 1/9] ecryptfs: Reject casefold directory inodes Gabriel Krisman Bertazi
2023-08-16 5:07 ` [f2fs-dev] [PATCH v6 2/9] 9p: Split ->weak_revalidate from ->revalidate Gabriel Krisman Bertazi
2023-08-16 5:07 ` [f2fs-dev] [PATCH v6 3/9] fs: Expose name under lookup to d_revalidate hooks Gabriel Krisman Bertazi
2023-11-22 20:59 ` Al Viro
2023-08-16 5:07 ` [f2fs-dev] [PATCH v6 4/9] fs: Add DCACHE_CASEFOLDED_NAME flag Gabriel Krisman Bertazi
2023-11-22 20:32 ` Al Viro
2023-08-16 5:07 ` [f2fs-dev] [PATCH v6 5/9] libfs: Validate negative dentries in case-insensitive directories Gabriel Krisman Bertazi
2023-11-22 20:20 ` Al Viro
2023-08-16 5:08 ` [f2fs-dev] [PATCH v6 6/9] libfs: Chain encryption checks after case-insensitive revalidation Gabriel Krisman Bertazi
2023-08-16 5:08 ` [f2fs-dev] [PATCH v6 7/9] libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops Gabriel Krisman Bertazi
2023-08-16 5:08 ` [f2fs-dev] [PATCH v6 8/9] ext4: Enable negative dentries on case-insensitive lookup Gabriel Krisman Bertazi
2023-08-16 5:08 ` [f2fs-dev] [PATCH v6 9/9] f2fs: " Gabriel Krisman Bertazi
2023-08-17 17:06 ` [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs Eric Biggers
2023-08-21 15:52 ` Christian Brauner
2023-08-21 18:53 ` Gabriel Krisman Bertazi
2023-08-22 9:03 ` Christian Brauner
2023-10-24 22:20 ` Gabriel Krisman Bertazi
2023-10-25 13:32 ` Christian Brauner
2023-10-25 15:19 ` Gabriel Krisman Bertazi
2023-11-19 23:11 ` Gabriel Krisman Bertazi
[not found] ` <655a9634.630a0220.d50d7.5063SMTPIN_ADDED_BROKEN@mx.google.com>
2023-11-20 15:06 ` Christian Brauner
2023-11-20 16:59 ` Gabriel Krisman Bertazi
2023-11-20 18:07 ` Linus Torvalds
2023-11-21 2:02 ` Theodore Ts'o
2023-11-21 2:29 ` Linus Torvalds
2023-11-21 3:03 ` Linus Torvalds
2023-11-21 5:12 ` Theodore Ts'o
2023-11-22 21:04 ` Al Viro
2023-11-21 2:27 ` Al Viro
2023-11-22 21:19 ` Al Viro
2023-11-23 0:18 ` Linus Torvalds
2023-11-23 5:09 ` Al Viro
2023-11-23 15:57 ` Gabriel Krisman Bertazi
2023-11-23 17:12 ` Al Viro
2023-11-23 17:37 ` Gabriel Krisman Bertazi
2023-11-23 18:24 ` Al Viro
2023-11-23 19:06 ` Gabriel Krisman Bertazi
2023-11-23 19:53 ` Al Viro
2023-11-23 20:15 ` Al Viro
2023-11-24 15:20 ` Gabriel Krisman Bertazi
2023-11-28 0:02 ` Al Viro
2023-11-23 21:52 ` Al Viro
2023-11-24 15:22 ` Gabriel Krisman Bertazi
2023-11-25 22:01 ` Al Viro
2023-11-26 4:52 ` Al Viro
2023-11-26 18:41 ` [f2fs-dev] fun with d_invalidate() vs. d_splice_alias() was " Al Viro
2023-11-27 6:38 ` Al Viro
2023-11-27 15:47 ` Eric W. Biederman
2023-11-27 16:01 ` Eric W. Biederman
2023-11-27 17:25 ` Al Viro
2023-11-27 18:26 ` Al Viro
2023-11-27 16:03 ` Al Viro
2023-11-27 16:14 ` Al Viro
2023-11-27 18:19 ` Eric W. Biederman
2023-11-27 18:43 ` Al Viro
2023-11-27 16:33 ` Christian Brauner [this message]
2023-11-29 4:53 ` Al Viro
2023-11-29 10:21 ` Christian Brauner
2023-11-29 15:19 ` Eric W. Biederman
[not found] ` <655f7665.df0a0220.58a21.e84fSMTPIN_ADDED_BROKEN@mx.google.com>
2023-11-23 16:41 ` [f2fs-dev] " Linus Torvalds
2023-11-23 1:12 ` Al Viro
2023-11-23 1:22 ` Al Viro
2023-11-22 3:30 ` Gabriel Krisman Bertazi
2024-01-16 19:02 ` patchwork-bot+f2fs
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231127-lahmlegen-redeverbot-43de4b46063c@brauner \
--to=brauner@kernel.org \
--cc=ebiederm@xmission.com \
--cc=ebiggers@kernel.org \
--cc=gabriel@krisman.be \
--cc=jaegeuk@kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).