All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Brian Foster <bfoster@redhat.com>, Ian Kent <raven@themaw.net>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	Miklos Szeredi <miklos@szeredi.hu>,
	David Howells <dhowells@redhat.com>,
	Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] vfs: check dentry is still valid in get_link()
Date: Tue, 18 Jan 2022 14:00:41 +1100	[thread overview]
Message-ID: <20220118030041.GB59729@dread.disaster.area> (raw)
In-Reply-To: <YeWZRL88KPtLWlkI@zeniv-ca.linux.org.uk>

On Mon, Jan 17, 2022 at 04:28:52PM +0000, Al Viro wrote:
> On Mon, Jan 17, 2022 at 09:35:58AM -0500, Brian Foster wrote:
> 
> > To Al's question, at the end of the day there is no rcu delay involved
> > with inode reuse in XFS. We do use call_rcu() for eventual freeing of
> > inodes (see __xfs_inode_free()), but inode reuse occurs for inodes that
> > have been put into a "reclaim" state before getting to the point of
> > freeing the struct inode memory. This lead to the long discussion [1]
> > Ian references around ways to potentially deal with that. I think the
> > TLDR of that thread is there are various potential options for
> > improvement, such as to rcu wait on inode creation/reuse (either
> > explicitly or via more open coded grace period cookie tracking), to rcu
> > wait somewhere in the destroy sequence before inodes become reuse
> > candidates, etc., but none of them seemingly agreeable for varying
> > reasons (IIRC mostly stemming from either performance or compexity) [2].
> > 
> > The change that has been made so far in XFS is to turn rcuwalk for
> > symlinks off once again, which looks like landed in Linus' tree as
> > commit 7b7820b83f23 ("xfs: don't expose internal symlink metadata
> > buffers to the vfs"). The hope is that between that patch and this
> > prospective vfs tweak, we can have a couple incremental fixes that at
> > least address the practical problem users have been running into (which
> > is a crash due to a NULL ->get_link() callback pointer due to inode
> > reuse). The inode reuse vs. rcu thing might still be a broader problem,
> > but AFAIA that mechanism has been in place in XFS on Linux pretty much
> > forever.
> 
> My problem with that is that pathname resolution very much relies upon
> the assumption that any inode it observes will *not* change its nature
> until the final rcu_read_unlock().  Papering over ->i_op->get_link reads
> in symlink case might be sufficient at the moment (I'm still not certain
> about that, though), but that's rather brittle.  E.g. if some XFS change
> down the road adds ->permission() on some inodes, you'll get the same
> problem in do_inode_permission().  We also have places where we rely upon
> 	sample ->d_seq
> 	fetch ->d_flags
> 	fetch ->d_inode
> 	validate ->d_seq
> 	...
> 	assume that inode type matches the information in flags
> 
> How painful would it be to make xfs_destroy_inode() a ->free_inode() instance?
> IOW, how far is xfs_inode_mark_reclaimable() from being callable in RCU
> callback context?

AIUI, not very close at all,

I'm pretty sure we can't put it under RCU callback context at all
because xfs_fs_destroy_inode() can take sleeping locks, perform
transactions, do IO, run rcu_read_lock() critical sections, etc.
This means that needs to run an a full task context and so can't run
from RCU callback context at all.

> IOW, ->free_inode() is RCU-delayed part of ->destroy_inode().  If both
> are present, ->destroy_inode() will be called synchronously, followed
> by ->free_inode() from RCU callback, so you can have both - moving just
> the "finally mark for reuse" part into ->free_inode() would be OK.
> Any blocking stuff (if any) can be left in ->destroy_inode()...

Yup, that's pretty much what we already do, except that we run the
RCU-delayed part of freeing the inode once XFS has finished with the
inode internally and the background inode GC reclaims it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2022-01-18  3:13 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-10  9:11 [PATCH] vfs: check dentry is still valid in get_link() Ian Kent
2022-01-15  6:38 ` Al Viro
2022-01-17  2:55   ` Ian Kent
2022-01-17 14:35     ` Brian Foster
2022-01-17 16:28       ` Al Viro
2022-01-17 18:10         ` Al Viro
2022-01-17 19:48           ` Al Viro
2022-01-18  1:32             ` Al Viro
2022-01-18  2:31               ` Ian Kent
2022-01-18  3:03                 ` Al Viro
2022-01-18 13:47               ` Brian Foster
2022-01-18 18:25                 ` Brian Foster
2022-01-18 19:20                   ` Al Viro
2022-01-18 20:58                     ` Brian Foster
2022-01-18  8:29           ` Christian Brauner
2022-01-18 16:04             ` Al Viro
2022-01-19  9:05               ` Christian Brauner
2022-01-17 18:42         ` Brian Foster
2022-01-18  3:00         ` Dave Chinner [this message]
2022-01-18  3:17           ` Al Viro
2022-01-18  4:12             ` Dave Chinner
2022-01-18  5:58               ` Al Viro
2022-01-18 23:25                 ` Dave Chinner
2022-01-19 14:08                   ` Brian Foster
2022-01-19 22:07                     ` Dave Chinner
2022-01-20 16:03                       ` Brian Foster
2022-01-20 16:34                         ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220118030041.GB59729@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=raven@themaw.net \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.