From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/7] xfs: take the ILOCK when accessing the inode core
Date: Tue, 4 Jan 2022 17:38:01 -0800 [thread overview]
Message-ID: <20220105013801.GG656707@magnolia> (raw)
In-Reply-To: <20220105000947.GK945095@dread.disaster.area>
On Wed, Jan 05, 2022 at 11:09:47AM +1100, Dave Chinner wrote:
> On Thu, Dec 16, 2021 at 03:56:09PM +1100, Dave Chinner wrote:
> > On Wed, Dec 15, 2021 at 05:09:21PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > >
> > > I was poking around in the directory code while diagnosing online fsck
> > > bugs, and noticed that xfs_readdir doesn't actually take the directory
> > > ILOCK when it calls xfs_dir2_isblock. xfs_dir_open most probably loaded
> > > the data fork mappings
> >
> > Yup, that is pretty much guaranteed. If the inode is extent or btree form as the
> > extent count will be non-zero, hence we can only get to the
> > xfs_dir2_isblock() check if the inode has moved from local to block
> > form between the open and xfs_dir2_isblock() get in the getdents
> > code.
> >
> > > and the VFS took i_rwsem (aka IOLOCK_SHARED) so
> > > we're protected against writer threads, but we really need to follow the
> > > locking model like we do in other places. The same applies to the
> > > shortform getdents function.
> >
> > Locking rules should be the same as xfs_dir_lookup().....
>
> Ok, I assumed the locking in xfs_dir_lookup() is optimal....
>
> .... which it turns out it isn't. All calls to xfs_dir_lookup()
> occur with the directory locked at the VFS level, so the internal
> contents of the directory can never change during a lookup. Hence
> holding the ILOCK across the entire lookup is both unnecessary and
> excessive.
>
> What xfs_dir_lookup() should be doing is what xfs_readdir() is
> largely already doing - just locking the ILOCK around buffer read
> operations when we are mapping directory offsets to physical disk
> locations and reading them from disk. Changing this is a
> significant set of changes, so it's not something that needs to be
> done right now.
>
> However, we still need to protect the xfs_dir2_isblock() lookup call
> in xfs_readdir().
>
> > > While we're at it, clean up the somewhat strange structure of this
> > > function.
> > >
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > ---
> > > fs/xfs/xfs_dir2_readdir.c | 28 +++++++++++++++++-----------
> > > 1 file changed, 17 insertions(+), 11 deletions(-)
> > >
> > >
> > > diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
> > > index 8310005af00f..25560151c273 100644
> > > --- a/fs/xfs/xfs_dir2_readdir.c
> > > +++ b/fs/xfs/xfs_dir2_readdir.c
> > > @@ -507,8 +507,9 @@ xfs_readdir(
> > > size_t bufsize)
> > > {
> > > struct xfs_da_args args = { NULL };
> > > - int rval;
> > > - int v;
> > > + unsigned int lock_mode;
> > > + int error;
> > > + int isblock;
> > >
> > > trace_xfs_readdir(dp);
> > >
> > > @@ -522,14 +523,19 @@ xfs_readdir(
> > > args.geo = dp->i_mount->m_dir_geo;
> > > args.trans = tp;
> > >
> > > - if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
> > > - rval = xfs_dir2_sf_getdents(&args, ctx);
> > > - else if ((rval = xfs_dir2_isblock(&args, &v)))
> > > - ;
> > > - else if (v)
> > > - rval = xfs_dir2_block_getdents(&args, ctx);
> > > - else
> > > - rval = xfs_dir2_leaf_getdents(&args, ctx, bufsize);
> > > + lock_mode = xfs_ilock_data_map_shared(dp);
> > > + if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) {
> > > + xfs_iunlock(dp, lock_mode);
> > > + return xfs_dir2_sf_getdents(&args, ctx);
> > > + }
>
> Directory inode format cannot change here, so we don't need to
> hold the ILOCK at all to do shortform checks.
Ok.
> > >
> > > - return rval;
> > > + error = xfs_dir2_isblock(&args, &isblock);
> > > + xfs_iunlock(dp, lock_mode);
> > > + if (error)
> > > + return error;
> > > +
> > > + if (isblock)
> > > + return xfs_dir2_block_getdents(&args, ctx);
>
> Can the xfs_dir2_isblock() call be moved into
> xfs_dir2_block_getdents() where it already takes the ILOCK to read
> the block?
Yeah.
> > > +
> > > + return xfs_dir2_leaf_getdents(&args, ctx, bufsize);
>
> Otherwise this patch is correct and this is where we should start
> fixing the directory locking mess...
<nod>
--D
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
next prev parent reply other threads:[~2022-01-05 1:38 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-16 1:09 [PATCHSET 0/7] xfs: random fixes for 5.17 Darrick J. Wong
2021-12-16 1:09 ` [PATCH 1/7] xfs: take the ILOCK when accessing the inode core Darrick J. Wong
2021-12-16 4:56 ` Dave Chinner
2021-12-17 18:59 ` Darrick J. Wong
2021-12-21 1:08 ` Darrick J. Wong
2021-12-21 5:10 ` Dave Chinner
2022-01-04 1:19 ` Darrick J. Wong
2022-01-05 0:09 ` Dave Chinner
2022-01-05 1:38 ` Darrick J. Wong [this message]
2021-12-16 1:09 ` [PATCH 2/7] xfs: shut down filesystem if we xfs_trans_cancel with deferred work items Darrick J. Wong
2021-12-16 4:57 ` Dave Chinner
2021-12-24 7:16 ` Christoph Hellwig
2021-12-16 1:09 ` [PATCH 3/7] xfs: fix a bug in the online fsck directory leaf1 bestcount check Darrick J. Wong
2021-12-16 5:05 ` Dave Chinner
2021-12-16 19:25 ` Darrick J. Wong
2021-12-16 21:17 ` Dave Chinner
2021-12-16 21:40 ` Darrick J. Wong
2021-12-16 22:04 ` Dave Chinner
2021-12-24 7:17 ` Christoph Hellwig
2021-12-16 1:09 ` [PATCH 4/7] xfs: prevent UAF in xfs_log_item_in_current_chkpt Darrick J. Wong
2021-12-16 4:36 ` Dave Chinner
2021-12-16 16:35 ` Darrick J. Wong
2021-12-16 1:09 ` [PATCH 5/7] xfs: fix quotaoff mutex usage now that we don't support disabling it Darrick J. Wong
2021-12-16 5:07 ` Dave Chinner
2021-12-24 7:17 ` Christoph Hellwig
2021-12-16 1:09 ` [PATCH 6/7] xfs: don't expose internal symlink metadata buffers to the vfs Darrick J. Wong
2021-12-16 5:11 ` Dave Chinner
2021-12-17 2:58 ` Ian Kent
2021-12-24 7:22 ` Christoph Hellwig
2021-12-16 1:09 ` [PATCH 7/7] xfs: only run COW extent recovery when there are no live extents Darrick J. Wong
2021-12-16 4:41 ` Dave Chinner
2021-12-24 7:18 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220105013801.GG656707@magnolia \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).