From: Dave Chinner <david@fromorbit.com>
To: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH] xfs: reinstate the iolock in xfs_readdir
Date: Wed, 4 Dec 2013 09:55:20 +1100 [thread overview]
Message-ID: <20131203225520.GZ10988@dastard> (raw)
In-Reply-To: <20131203212951.GP1935@sgi.com>
On Tue, Dec 03, 2013 at 03:29:51PM -0600, Ben Myers wrote:
> Although it was removed in commit 051e7cd44ab8, ilock needs to be taken in
> xfs_readdir because we might have to read the extent list in from disk. This
> keeps other threads from reading from or writing to the extent list while it is
> being read in and is still in a transitional state.
>
> This has been associated with "Access to block zero" messages on directories
> with large numbers of extents resulting from excessive filesytem fragmentation,
> as well as extent list corruption. Unfortunately no test case at this point.
That commit was from 2007, so this is not a result of a recent
change. As it is, access to directories is completely
serialised by the VFS i_mutex. We don't need the iolock to read in
the extent list for a directory, because by the time we get to
readdir it is already guaranteed to be read in by this code:
STATIC int
xfs_dir_open(
struct inode *inode,
struct file *file)
{
struct xfs_inode *ip = XFS_I(inode);
int mode;
int error;
error = xfs_file_open(inode, file);
if (error)
return error;
/*
* If there are any blocks, read-ahead block 0 as we're almost
* certain to have the next operation be a read there.
*/
mode = xfs_ilock_map_shared(ip);
if (ip->i_d.di_nextents > 0)
xfs_dir3_data_readahead(NULL, ip, 0, -1);
xfs_iunlock(ip, mode);
return 0;
}
Which means that for a directory in block/leaf/node form the open of
it prior to the first readdir() call will read the extent tree into
memory and it will be pinned in memory for at least the life of the
file descriptor that is open on the directory.
If the directory is not in block form at open time, then the
first modification that takes it to block form will correctly lock
and initialise the extent tree, and concurrent readers will block on
the i_mutex at the VFS, not on the XFS iolock.
Now, if something is accessing the directory without going through
the VFS, there might be an issue, but all read accesses are via a
file descriptor of some kind and so should always have the extent
list initialised and read in correctly before reads occur due to the
code in xfs_dir_open() and serialisation via the i_mutex at the VFS
level.
So I don't see why holding the iolock during readdir is necessary,
nor how not holding it could lead to corrupt extent trees being seen
during directory operations. I need more information to understand
the issue being seen....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2013-12-03 22:55 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-03 21:29 [PATCH] xfs: reinstate the iolock in xfs_readdir Ben Myers
2013-12-03 22:55 ` Dave Chinner [this message]
2013-12-03 23:14 ` Ben Myers
2013-12-04 0:10 ` Dave Chinner
2013-12-04 13:08 ` Christoph Hellwig
2013-12-05 0:12 ` Ben Myers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131203225520.GZ10988@dastard \
--to=david@fromorbit.com \
--cc=bpm@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox