Re: [PATCH] vfs: check dentry is still valid in get_link()

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Brian Foster <bfoster@redhat.com>, Ian Kent <raven@themaw.net>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	Miklos Szeredi <miklos@szeredi.hu>,
	David Howells <dhowells@redhat.com>,
	Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] vfs: check dentry is still valid in get_link()
Date: Tue, 18 Jan 2022 14:00:41 +1100	[thread overview]
Message-ID: <20220118030041.GB59729@dread.disaster.area> (raw)
In-Reply-To: <YeWZRL88KPtLWlkI@zeniv-ca.linux.org.uk>

On Mon, Jan 17, 2022 at 04:28:52PM +0000, Al Viro wrote:
> On Mon, Jan 17, 2022 at 09:35:58AM -0500, Brian Foster wrote:
> 
> > To Al's question, at the end of the day there is no rcu delay involved
> > with inode reuse in XFS. We do use call_rcu() for eventual freeing of
> > inodes (see __xfs_inode_free()), but inode reuse occurs for inodes that
> > have been put into a "reclaim" state before getting to the point of
> > freeing the struct inode memory. This lead to the long discussion [1]
> > Ian references around ways to potentially deal with that. I think the
> > TLDR of that thread is there are various potential options for
> > improvement, such as to rcu wait on inode creation/reuse (either
> > explicitly or via more open coded grace period cookie tracking), to rcu
> > wait somewhere in the destroy sequence before inodes become reuse
> > candidates, etc., but none of them seemingly agreeable for varying
> > reasons (IIRC mostly stemming from either performance or compexity) [2].
> > 
> > The change that has been made so far in XFS is to turn rcuwalk for
> > symlinks off once again, which looks like landed in Linus' tree as
> > commit 7b7820b83f23 ("xfs: don't expose internal symlink metadata
> > buffers to the vfs"). The hope is that between that patch and this
> > prospective vfs tweak, we can have a couple incremental fixes that at
> > least address the practical problem users have been running into (which
> > is a crash due to a NULL ->get_link() callback pointer due to inode
> > reuse). The inode reuse vs. rcu thing might still be a broader problem,
> > but AFAIA that mechanism has been in place in XFS on Linux pretty much
> > forever.
> 
> My problem with that is that pathname resolution very much relies upon
> the assumption that any inode it observes will *not* change its nature
> until the final rcu_read_unlock().  Papering over ->i_op->get_link reads
> in symlink case might be sufficient at the moment (I'm still not certain
> about that, though), but that's rather brittle.  E.g. if some XFS change
> down the road adds ->permission() on some inodes, you'll get the same
> problem in do_inode_permission().  We also have places where we rely upon
> 	sample ->d_seq
> 	fetch ->d_flags
> 	fetch ->d_inode
> 	validate ->d_seq
> 	...
> 	assume that inode type matches the information in flags
> 
> How painful would it be to make xfs_destroy_inode() a ->free_inode() instance?
> IOW, how far is xfs_inode_mark_reclaimable() from being callable in RCU
> callback context?

AIUI, not very close at all,

I'm pretty sure we can't put it under RCU callback context at all
because xfs_fs_destroy_inode() can take sleeping locks, perform
transactions, do IO, run rcu_read_lock() critical sections, etc.
This means that needs to run an a full task context and so can't run
from RCU callback context at all.

> IOW, ->free_inode() is RCU-delayed part of ->destroy_inode().  If both
> are present, ->destroy_inode() will be called synchronously, followed
> by ->free_inode() from RCU callback, so you can have both - moving just
> the "finally mark for reuse" part into ->free_inode() would be OK.
> Any blocking stuff (if any) can be left in ->destroy_inode()...

Yup, that's pretty much what we already do, except that we run the
RCU-delayed part of freeing the inode once XFS has finished with the
inode internally and the background inode GC reclaims it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2022-01-18  3:13 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-10  9:11 [PATCH] vfs: check dentry is still valid in get_link() Ian Kent
2022-01-15  6:38 ` Al Viro
2022-01-17  2:55   ` Ian Kent
2022-01-17 14:35     ` Brian Foster
2022-01-17 16:28       ` Al Viro
2022-01-17 18:10         ` Al Viro
2022-01-17 19:48           ` Al Viro
2022-01-18  1:32             ` Al Viro
2022-01-18  2:31               ` Ian Kent
2022-01-18  3:03                 ` Al Viro
2022-01-18 13:47               ` Brian Foster
2022-01-18 18:25                 ` Brian Foster
2022-01-18 19:20                   ` Al Viro
2022-01-18 20:58                     ` Brian Foster
2022-01-18  8:29           ` Christian Brauner
2022-01-18 16:04             ` Al Viro
2022-01-19  9:05               ` Christian Brauner
2022-01-17 18:42         ` Brian Foster
2022-01-18  3:00         ` Dave Chinner [this message]
2022-01-18  3:17           ` Al Viro
2022-01-18  4:12             ` Dave Chinner
2022-01-18  5:58               ` Al Viro
2022-01-18 23:25                 ` Dave Chinner
2022-01-19 14:08                   ` Brian Foster
2022-01-19 22:07                     ` Dave Chinner
2022-01-20 16:03                       ` Brian Foster
2022-01-20 16:34                         ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220118030041.GB59729@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=raven@themaw.net \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).