From: Brian Foster <bfoster@redhat.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ian Kent <raven@themaw.net>,
"Darrick J. Wong" <djwong@kernel.org>,
Christoph Hellwig <hch@lst.de>,
Miklos Szeredi <miklos@szeredi.hu>,
David Howells <dhowells@redhat.com>,
Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
xfs <linux-xfs@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] vfs: check dentry is still valid in get_link()
Date: Tue, 18 Jan 2022 15:58:04 -0500 [thread overview]
Message-ID: <Yecp3DspJOkhaDGV@bfoster> (raw)
In-Reply-To: <YecTA9nclOowdDEu@zeniv-ca.linux.org.uk>
On Tue, Jan 18, 2022 at 07:20:35PM +0000, Al Viro wrote:
> On Tue, Jan 18, 2022 at 01:25:15PM -0500, Brian Foster wrote:
>
> > If I go back to the inactive -> reclaimable grace period variant and
> > also insert a start_poll_synchronize_rcu() and
> > poll_state_synchronize_rcu() pair across the inactive processing
> > sequence, I start seeing numbers closer to ~36k cycles. IOW, the
> > xfs_inodegc_inactivate() helper looks something like this:
> >
> > if (poll_state_synchronize_rcu(ip->i_destroy_gp))
> > xfs_inodegc_set_reclaimable(ip);
> > else
> > call_rcu(&VFS_I(ip)->i_rcu, xfs_inodegc_set_reclaimable_callback);
> >
> > ... to skip the rcu grace period if one had already passed while the
> > inode sat on the inactivation queue and was processed.
> >
> > However my box went haywire shortly after with rcu stall reports or
> > something if I let that continue to run longer, so I'll probably have to
> > look into that lockdep splat (complaining about &pag->pag_ici_lock in
> > rcu context, perhaps needs to become irq safe?) or see if something else
> > is busted..
>
> Umm... Could you (or Dave) describe where does the mainline do the
> RCU delay mentioned several times in these threads, in case of
> * unlink()
> * overwriting rename()
> * open() + unlink() + close() (that one, obviously, for regular files)
>
If you're referring to the existing rcu delay in XFS, I suspect that
refers to xfs_reclaim_inode(). The last bits of that function remove the
inode from the perag tree and then calls __xfs_inode_free(), which frees
the inode via rcu callback.
BTW, I think the experiment above is either going to require an audit to
make the various _set_reclaimable() locks irq safe or something a bit
more ugly like putting the inode back on a workqueue after the rcu delay
to make the state transition. Given the incremental improvement from
using start_poll_synchronize_rcu(), I replaced the above destroy side
code with a cond_synchronize_rcu(ip->i_destroy_gp) call on the
iget/recycle side and see similar results (~36k cycles per 60s, but so
far without any explosions).
Brian
> The thing is, if we already do have an RCU delay in there, there might be
> a better solution - making sure it happens downstream of d_drop() (in case
> when dentry has other references) or dentry going negative (in case we are
> holding the sole reference to it).
>
> If we can do that, RCU'd dcache lookups won't run into inode reuse at all.
> Sure, right now d_delete() comes last *and* we want the target inode to stay
> around past the call of ->unlink(). But if you look at do_unlinkat() you'll
> see an interesting-looking wart with ihold/iput around vfs_unlink(). Not
> sure if you remember the story on that one; basically, it's "we'd rather
> have possible IO on inode freeing to happen outside of the lock on parent".
>
> nfsd and mqueue do the same thing; ksmbd does not. OTOH, ksmbd appears to
> force the "make victim go unhashed, sole reference or not". ecryptfs
> definitely does that forcing (deliberately so).
>
> That area could use some rethinking, and if we can deal with the inode reuse
> delay while we are at it...
>
> Overwriting rename() is also interesting in that respect, of course.
>
> I can go and try to RTFS my way through xfs iget-related code, but I'd
> obviously prefer to do so with at least some overview of that thing
> from the folks familiar with it. Seeing that it's a lockless search
> structure, missing something subtle there is all too easy, especially
> with the lookup-by-fhandle stuff in the mix...
>
next prev parent reply other threads:[~2022-01-18 20:58 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-10 9:11 [PATCH] vfs: check dentry is still valid in get_link() Ian Kent
2022-01-15 6:38 ` Al Viro
2022-01-17 2:55 ` Ian Kent
2022-01-17 14:35 ` Brian Foster
2022-01-17 16:28 ` Al Viro
2022-01-17 18:10 ` Al Viro
2022-01-17 19:48 ` Al Viro
2022-01-18 1:32 ` Al Viro
2022-01-18 2:31 ` Ian Kent
2022-01-18 3:03 ` Al Viro
2022-01-18 13:47 ` Brian Foster
2022-01-18 18:25 ` Brian Foster
2022-01-18 19:20 ` Al Viro
2022-01-18 20:58 ` Brian Foster [this message]
2022-01-18 8:29 ` Christian Brauner
2022-01-18 16:04 ` Al Viro
2022-01-19 9:05 ` Christian Brauner
2022-01-17 18:42 ` Brian Foster
2022-01-18 3:00 ` Dave Chinner
2022-01-18 3:17 ` Al Viro
2022-01-18 4:12 ` Dave Chinner
2022-01-18 5:58 ` Al Viro
2022-01-18 23:25 ` Dave Chinner
2022-01-19 14:08 ` Brian Foster
2022-01-19 22:07 ` Dave Chinner
2022-01-20 16:03 ` Brian Foster
2022-01-20 16:34 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yecp3DspJOkhaDGV@bfoster \
--to=bfoster@redhat.com \
--cc=dhowells@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=raven@themaw.net \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).