From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id qB79p47a208093 for ; Fri, 7 Dec 2012 03:51:05 -0600 Received: from ipmail06.adl2.internode.on.net (ipmail06.adl2.internode.on.net [150.101.137.129]) by cuda.sgi.com with ESMTP id p2GnPFIy8d4NyyZr for ; Fri, 07 Dec 2012 01:53:29 -0800 (PST) Date: Fri, 7 Dec 2012 20:53:26 +1100 From: Dave Chinner Subject: Re: [PATCH] xfs: Fix re-use of EWOULDBLOCK during read on dm-mirror Message-ID: <20121207095326.GI27172@dastard> References: <50C11E95.4050502@suse.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <50C11E95.4050502@suse.com> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Jeff Mahoney Cc: xfs@oss.sgi.com On Thu, Dec 06, 2012 at 05:39:17PM -0500, Jeff Mahoney wrote: > When using lvconvert to convert a linear mapping to a dm-raid1 mirror, > we encountered issues where the log would be flooded with messages like: > > metadata I/O error: block 0xee7060 ("xfs_trans_read_buf") error 11 numblks 8 > > The cause is that dm-mirror (and striping, and others) will return > -EWOULDBLOCK for readahead requests while the mirror is rebuilding. That's nasty - since when has DM been doing this? I doubt anything handles a EAGAIN error from the storage layer properly - it's not an error the filesystem expects from the lower layers at all. > XFS's > end_io routine caches the errno and then xfs_buf_iowait bails out early > when it encounters it after issuing the i/o request. That doesn't sound right. when XFS issues buffer readahead, it does not wait for it to complete. i.e. we never get to xfs_buf_iowait() on readahead buffers. If something then issues a read on the buffer that failed the readahead, then we enter xfs_buf_iowait() after reissuing the IO. If it's aborting because of a stale EWOULDBLOCK as a result of readahead, then the problem is either: - failed readahead should not be leaving an error in b_error; or - the read IO did not zero b_error before starting the IO > The I/O eventually > succeeds and the endio routine resets bp->b_error, AFAICT, it's a different IO that succeeds (i.e. the resubmitted one that is being waited for), not the same one. > but the original read > request has already returned -EWOULDBLOCK to the user and added the log > message above to the kernel log, freaking everyone out. > > This patch ignores EWOULDBLOCK when deciding whether to wait for the I/O > to complete and tries again, allowing the read to succeed as expected. Which does not appear to be the correct fix - preventing failed readahead from leaving a stale error on the buffer seems like the right thing to do here... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs