From: Brian Foster <bfoster@redhat.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ian Kent <raven@themaw.net>,
"Darrick J. Wong" <djwong@kernel.org>,
Christoph Hellwig <hch@lst.de>,
Miklos Szeredi <miklos@szeredi.hu>,
David Howells <dhowells@redhat.com>,
Kernel Mailing List <linux-kernel@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
xfs <linux-xfs@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] vfs: check dentry is still valid in get_link()
Date: Tue, 18 Jan 2022 15:58:04 -0500 [thread overview]
Message-ID: <Yecp3DspJOkhaDGV@bfoster> (raw)
In-Reply-To: <YecTA9nclOowdDEu@zeniv-ca.linux.org.uk>
On Tue, Jan 18, 2022 at 07:20:35PM +0000, Al Viro wrote:
> On Tue, Jan 18, 2022 at 01:25:15PM -0500, Brian Foster wrote:
>
> > If I go back to the inactive -> reclaimable grace period variant and
> > also insert a start_poll_synchronize_rcu() and
> > poll_state_synchronize_rcu() pair across the inactive processing
> > sequence, I start seeing numbers closer to ~36k cycles. IOW, the
> > xfs_inodegc_inactivate() helper looks something like this:
> >
> > if (poll_state_synchronize_rcu(ip->i_destroy_gp))
> > xfs_inodegc_set_reclaimable(ip);
> > else
> > call_rcu(&VFS_I(ip)->i_rcu, xfs_inodegc_set_reclaimable_callback);
> >
> > ... to skip the rcu grace period if one had already passed while the
> > inode sat on the inactivation queue and was processed.
> >
> > However my box went haywire shortly after with rcu stall reports or
> > something if I let that continue to run longer, so I'll probably have to
> > look into that lockdep splat (complaining about &pag->pag_ici_lock in
> > rcu context, perhaps needs to become irq safe?) or see if something else
> > is busted..
>
> Umm... Could you (or Dave) describe where does the mainline do the
> RCU delay mentioned several times in these threads, in case of
> * unlink()
> * overwriting rename()
> * open() + unlink() + close() (that one, obviously, for regular files)
>
If you're referring to the existing rcu delay in XFS, I suspect that
refers to xfs_reclaim_inode(). The last bits of that function remove the
inode from the perag tree and then calls __xfs_inode_free(), which frees
the inode via rcu callback.
BTW, I think the experiment above is either going to require an audit to
make the various _set_reclaimable() locks irq safe or something a bit
more ugly like putting the inode back on a workqueue after the rcu delay
to make the state transition. Given the incremental improvement from
using start_poll_synchronize_rcu(), I replaced the above destroy side
code with a cond_synchronize_rcu(ip->i_destroy_gp) call on the
iget/recycle side and see similar results (~36k cycles per 60s, but so
far without any explosions).
Brian
> The thing is, if we already do have an RCU delay in there, there might be
> a better solution - making sure it happens downstream of d_drop() (in case
> when dentry has other references) or dentry going negative (in case we are
> holding the sole reference to it).
>
> If we can do that, RCU'd dcache lookups won't run into inode reuse at all.
> Sure, right now d_delete() comes last *and* we want the target inode to stay
> around past the call of ->unlink(). But if you look at do_unlinkat() you'll
> see an interesting-looking wart with ihold/iput around vfs_unlink(). Not
> sure if you remember the story on that one; basically, it's "we'd rather
> have possible IO on inode freeing to happen outside of the lock on parent".
>
> nfsd and mqueue do the same thing; ksmbd does not. OTOH, ksmbd appears to
> force the "make victim go unhashed, sole reference or not". ecryptfs
> definitely does that forcing (deliberately so).
>
> That area could use some rethinking, and if we can deal with the inode reuse
> delay while we are at it...
>
> Overwriting rename() is also interesting in that respect, of course.
>
> I can go and try to RTFS my way through xfs iget-related code, but I'd
> obviously prefer to do so with at least some overview of that thing
> from the folks familiar with it. Seeing that it's a lockless search
> structure, missing something subtle there is all too easy, especially
> with the lookup-by-fhandle stuff in the mix...
>
next prev parent reply other threads:[~2022-01-18 20:58 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-10 9:11 [PATCH] vfs: check dentry is still valid in get_link() Ian Kent
2022-01-15 6:38 ` Al Viro
2022-01-17 2:55 ` Ian Kent
2022-01-17 14:35 ` Brian Foster
2022-01-17 16:28 ` Al Viro
2022-01-17 18:10 ` Al Viro
2022-01-17 19:48 ` Al Viro
2022-01-18 1:32 ` Al Viro
2022-01-18 2:31 ` Ian Kent
2022-01-18 3:03 ` Al Viro
2022-01-18 13:47 ` Brian Foster
2022-01-18 18:25 ` Brian Foster
2022-01-18 19:20 ` Al Viro
2022-01-18 20:58 ` Brian Foster [this message]
2022-01-18 8:29 ` Christian Brauner
2022-01-18 16:04 ` Al Viro
2022-01-19 9:05 ` Christian Brauner
2022-01-17 18:42 ` Brian Foster
2022-01-18 3:00 ` Dave Chinner
2022-01-18 3:17 ` Al Viro
2022-01-18 4:12 ` Dave Chinner
2022-01-18 5:58 ` Al Viro
2022-01-18 23:25 ` Dave Chinner
2022-01-19 14:08 ` Brian Foster
2022-01-19 22:07 ` Dave Chinner
2022-01-20 16:03 ` Brian Foster
2022-01-20 16:34 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yecp3DspJOkhaDGV@bfoster \
--to=bfoster@redhat.com \
--cc=dhowells@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=raven@themaw.net \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.