public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 09/14] xfs: inode recovery readahead can race with inode buffer creation
Date: Mon, 15 Feb 2016 17:18:20 +1100	[thread overview]
Message-ID: <1455517105-20033-10-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1455517105-20033-1-git-send-email-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

Source kernel commit b79f4a1c68bb99152d0785ee4ea3ab4396cdacc6

When we do inode readahead in log recovery, we do can do the
readahead before we've replayed the icreate transaction that stamps
the buffer with inode cores. The inode readahead verifier catches
this and marks the buffer as !done to indicate that it doesn't yet
contain valid inodes.

In adding buffer error notification  (i.e. setting b_error = -EIO at
the same time as as we clear the done flag) to such a readahead
verifier failure, we can then get subsequent inode recovery failing
with this error:

XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32

This occurs when readahead completion races with icreate item replay
such as:

	inode readahead
		find buffer
		lock buffer
		submit RA io
	....
	icreate recovery
	    xfs_trans_get_buffer
		find buffer
		lock buffer
		<blocks on RA completion>
	.....
	<ra completion>
		fails verifier
		clear XBF_DONE
		set bp->b_error = -EIO
		release and unlock buffer
	<icreate gains lock>
	icreate initialises buffer
	marks buffer as done
	adds buffer to delayed write queue
	releases buffer

At this point, we have an initialised inode buffer that is up to
date but has an -EIO state registered against it. When we finally
get to recovering an inode in that buffer:

	inode item recovery
	    xfs_trans_read_buffer
		find buffer
		lock buffer
		sees XBF_DONE is set, returns buffer
	    sees bp->b_error is set
		fail log recovery!

Essentially, we need xfs_trans_get_buf_map() to clear the error status of
the buffer when doing a lookup. This function returns uninitialised
buffers, so the buffer returned can not be in an error state and
none of the code that uses this function expects b_error to be set
on return. Indeed, there is an ASSERT(!bp->b_error); in the
transaction case in xfs_trans_get_buf_map() that would have caught
this if log recovery used transactions....

This patch firstly changes the inode readahead failure to set -EIO
on the buffer, and secondly changes xfs_buf_get_map() to never
return a buffer with an error state set so this first change doesn't
cause unexpected log recovery failures.

cc: <stable@vger.kernel.org> # 3.12 - current
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_inode_buf.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 324715e..546af74 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -71,11 +71,12 @@ xfs_dinode_good_version(
  * has not had the inode cores stamped into it. Hence for readahead, the buffer
  * may be potentially invalid.
  *
- * If the readahead buffer is invalid, we don't want to mark it with an error,
- * but we do want to clear the DONE status of the buffer so that a followup read
- * will re-read it from disk. This will ensure that we don't get an unnecessary
- * warnings during log recovery and we don't get unnecssary panics on debug
- * kernels.
+ * If the readahead buffer is invalid, we need to mark it with an error and
+ * clear the DONE status of the buffer so that a followup read will re-read it
+ * from disk. We don't report the error otherwise to avoid warnings during log
+ * recovery and we don't get unnecssary panics on debug kernels. We use EIO here
+ * because all we want to do is say readahead failed; there is no-one to report
+ * the error to, so this will distinguish it from a non-ra verifier failure.
  */
 static void
 xfs_inode_buf_verify(
@@ -102,6 +103,7 @@ xfs_inode_buf_verify(
 						XFS_RANDOM_ITOBP_INOTOBP))) {
 			if (readahead) {
 				bp->b_flags &= ~XBF_DONE;
+				xfs_buf_ioerror(bp, -EIO);
 				return;
 			}
 
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2016-02-15  6:19 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
2016-02-15  6:18 ` [PATCH 01/14] libxfs: Optimize the loop for xfs_bitmap_empty Dave Chinner
2016-02-15  6:18 ` [PATCH 02/14] xfs: add missing bmap cancel calls in error paths Dave Chinner
2016-02-15  6:18 ` [PATCH 03/14] xfs: log local to remote symlink conversions correctly on v5 supers Dave Chinner
2016-02-15  6:18 ` [PATCH 04/14] xfs: per-filesystem stats counter implementation Dave Chinner
2016-02-15  6:18 ` [PATCH 05/14] xfs: introduce BMAPI_ZERO for allocating zeroed extents Dave Chinner
2016-02-16 19:20   ` Brian Foster
2016-02-15  6:18 ` [PATCH 06/14] xfs: get mp from bma->ip in xfs_bmap code Dave Chinner
2016-02-15  6:18 ` [PATCH 07/14] xfs: bmapbt checking on debug kernels too expensive Dave Chinner
2016-02-15  6:18 ` [PATCH 08/14] xfs: eliminate committed arg from xfs_bmap_finish Dave Chinner
2016-02-15  6:18 ` Dave Chinner [this message]
2016-02-15  6:18 ` [PATCH 10/14] xfs: handle dquot buffer readahead in log recovery correctly Dave Chinner
2016-02-16 19:20   ` Brian Foster
2016-02-15  6:18 ` [PATCH 11/14] xfs: swap leaf buffer into path struct atomically during path shift Dave Chinner
2016-02-15  6:18 ` [PATCH 12/14] libxfs: fix two comment typos Dave Chinner
2016-02-15  6:18 ` [PATCH 13/14] xfs: stop holding ILOCK over filldir callbacks Dave Chinner
2016-02-15  6:18 ` [PATCH 14/14] xfs: Validate the length of on-disk ACLs Dave Chinner
2016-02-16 19:19 ` [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1455517105-20033-10-git-send-email-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox