[PATCH 0/2 v3] xfs: handle dquot buffer readahead in log recovery correctly

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2 v3] xfs: handle dquot buffer readahead in log recovery correctly
@ 2016-01-11  3:24 Dave Chinner
  2016-01-11  3:24 ` [PATCH 1/2] xfs: inode recovery readahead can race with inode buffer creation Dave Chinner
  2016-01-11  3:24 ` [PATCH 2/2] xfs: handle dquot buffer readahead in log recovery correctly Dave Chinner
  0 siblings, 2 replies; 4+ messages in thread
From: Dave Chinner @ 2016-01-11  3:24 UTC (permalink / raw)
  To: xfs

Hi folks,

Version 3 of this patchset. Version 2 of the patchset added a fix
for the inode readahead error setting in log recovery, which turned
out to be problematic.

I've split that change out into it's own patch which includes the
fix it requires to prevent a race condition in log recovery to do
with inode buffer creation recovery. This is a generic fix to
xfs_buf_get_map(), in that if we are returning an initialised buffer
for the caller to use, it shouldn't have an error set on it from the
previous operation. If we don't clear the error before returning the
buffer, it causes unexpected failures further down the line. This
caused log recovery failures in generic/073 on slow disks (i.e.
needed sufficient readahead IO latency to open the race window) and
a couple of other tests as well.

The second patch is essentially the original patch with just the
inode buffer changes removed. There are no other changes to that
patch.

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] xfs: inode recovery readahead can race with inode buffer creation
  2016-01-11  3:24 [PATCH 0/2 v3] xfs: handle dquot buffer readahead in log recovery correctly Dave Chinner
@ 2016-01-11  3:24 ` Dave Chinner
  2016-01-11 16:03   ` Brian Foster
  2016-01-11  3:24 ` [PATCH 2/2] xfs: handle dquot buffer readahead in log recovery correctly Dave Chinner
  1 sibling, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2016-01-11  3:24 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When we do inode readahead in log recovery, we do can do the
readahead before we've replayed the icreate transaction that stamps
the buffer with inode cores. The inode readahead verifier catches
this and marks the buffer as !done to indicate that it doesn't yet
contain valid inodes.

In adding buffer error notification  (i.e. setting b_error = -EIO at
the same time as as we clear the done flag) to such a readahead
verifier failure, we can then get subsequent inode recovery failing
with this error:

XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32

This occurs when readahead completion races with icreate item replay
such as:

	inode readahead
		find buffer
		lock buffer
		submit RA io
	....
	icreate recovery
	    xfs_trans_get_buffer
		find buffer
		lock buffer
		<blocks on RA completion>
	.....
	<ra completion>
		fails verifier
		clear XBF_DONE
		set bp->b_error = -EIO
		release and unlock buffer
	<icreate gains lock>
	icreate initialises buffer
	marks buffer as done
	adds buffer to delayed write queue
	releases buffer

At this point, we have an initialised inode buffer that is up to
date but has an -EIO state registered against it. When we finally
get to recovering an inode in that buffer:

	inode item recovery
	    xfs_trans_read_buffer
		find buffer
		lock buffer
		sees XBF_DONE is set, returns buffer
	    sees bp->b_error is set
		fail log recovery!

Essentially, we need xfs_trans_get_buf_map() to clear the error status of
the buffer when doing a lookup. This function returns uninitialised
buffers, so the buffer returned can not be in an error state and
none of the code that uses this function expects b_error to be set
on return. Indeed, there is an ASSERT(!bp->b_error); in the
transaction case in xfs_trans_get_buf_map() that would have caught
this if log recovery used transactions....

This patch firstly changes the inode readahead failure to set -EIO
on the buffer, and secondly changes xfs_buf_get_map() to never
return a buffer with an error state set so this first change doesn't
cause unexpected log recovery failures.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_inode_buf.c | 12 +++++++-----
 fs/xfs/xfs_buf.c              |  7 +++++++
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 1b8d98a..ff17c48 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -62,11 +62,12 @@ xfs_inobp_check(
  * has not had the inode cores stamped into it. Hence for readahead, the buffer
  * may be potentially invalid.
  *
- * If the readahead buffer is invalid, we don't want to mark it with an error,
- * but we do want to clear the DONE status of the buffer so that a followup read
- * will re-read it from disk. This will ensure that we don't get an unnecessary
- * warnings during log recovery and we don't get unnecssary panics on debug
- * kernels.
+ * If the readahead buffer is invalid, we need to mark it with an error and
+ * clear the DONE status of the buffer so that a followup read will re-read it
+ * from disk. We don't report the error otherwise to avoid warnings during log
+ * recovery and we don't get unnecssary panics on debug kernels. We use EIO here
+ * because all we want to do is say readahead failed; there is no-one to report
+ * the error to, so this will distinguish it from a non-ra verifier failure.
  */
 static void
 xfs_inode_buf_verify(
@@ -93,6 +94,7 @@ xfs_inode_buf_verify(
 						XFS_RANDOM_ITOBP_INOTOBP))) {
 			if (readahead) {
 				bp->b_flags &= ~XBF_DONE;
+				xfs_buf_ioerror(bp, -EIO);
 				return;
 			}
 
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 45a8ea7..ae86b16 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -604,6 +604,13 @@ found:
 		}
 	}
 
+	/*
+	 * Clear b_error if this is a lookup from a caller that doesn't expect
+	 * valid data to be found in the buffer.
+	 */
+	if (!(flags & XBF_READ))
+		xfs_buf_ioerror(bp, 0);
+
 	XFS_STATS_INC(target->bt_mount, xb_get);
 	trace_xfs_buf_get(bp, flags, _RET_IP_);
 	return bp;
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] xfs: inode recovery readahead can race with inode buffer creation
  2016-01-11  3:24 ` [PATCH 1/2] xfs: inode recovery readahead can race with inode buffer creation Dave Chinner
@ 2016-01-11 16:03   ` Brian Foster
  0 siblings, 0 replies; 4+ messages in thread
From: Brian Foster @ 2016-01-11 16:03 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, Jan 11, 2016 at 02:24:53PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we do inode readahead in log recovery, we do can do the
> readahead before we've replayed the icreate transaction that stamps
> the buffer with inode cores. The inode readahead verifier catches
> this and marks the buffer as !done to indicate that it doesn't yet
> contain valid inodes.
> 
> In adding buffer error notification  (i.e. setting b_error = -EIO at
> the same time as as we clear the done flag) to such a readahead
> verifier failure, we can then get subsequent inode recovery failing
> with this error:
> 
> XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32
> 
> This occurs when readahead completion races with icreate item replay
> such as:
> 
> 	inode readahead
> 		find buffer
> 		lock buffer
> 		submit RA io
> 	....
> 	icreate recovery
> 	    xfs_trans_get_buffer
> 		find buffer
> 		lock buffer
> 		<blocks on RA completion>
> 	.....
> 	<ra completion>
> 		fails verifier
> 		clear XBF_DONE
> 		set bp->b_error = -EIO
> 		release and unlock buffer
> 	<icreate gains lock>
> 	icreate initialises buffer
> 	marks buffer as done
> 	adds buffer to delayed write queue
> 	releases buffer
> 
> At this point, we have an initialised inode buffer that is up to
> date but has an -EIO state registered against it. When we finally
> get to recovering an inode in that buffer:
> 
> 	inode item recovery
> 	    xfs_trans_read_buffer
> 		find buffer
> 		lock buffer
> 		sees XBF_DONE is set, returns buffer
> 	    sees bp->b_error is set
> 		fail log recovery!
> 
> Essentially, we need xfs_trans_get_buf_map() to clear the error status of
> the buffer when doing a lookup. This function returns uninitialised
> buffers, so the buffer returned can not be in an error state and
> none of the code that uses this function expects b_error to be set
> on return. Indeed, there is an ASSERT(!bp->b_error); in the
> transaction case in xfs_trans_get_buf_map() that would have caught
> this if log recovery used transactions....
> 
> This patch firstly changes the inode readahead failure to set -EIO
> on the buffer, and secondly changes xfs_buf_get_map() to never
> return a buffer with an error state set so this first change doesn't
> cause unexpected log recovery failures.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/libxfs/xfs_inode_buf.c | 12 +++++++-----
>  fs/xfs/xfs_buf.c              |  7 +++++++
>  2 files changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
> index 1b8d98a..ff17c48 100644
> --- a/fs/xfs/libxfs/xfs_inode_buf.c
> +++ b/fs/xfs/libxfs/xfs_inode_buf.c
> @@ -62,11 +62,12 @@ xfs_inobp_check(
>   * has not had the inode cores stamped into it. Hence for readahead, the buffer
>   * may be potentially invalid.
>   *
> - * If the readahead buffer is invalid, we don't want to mark it with an error,
> - * but we do want to clear the DONE status of the buffer so that a followup read
> - * will re-read it from disk. This will ensure that we don't get an unnecessary
> - * warnings during log recovery and we don't get unnecssary panics on debug
> - * kernels.
> + * If the readahead buffer is invalid, we need to mark it with an error and
> + * clear the DONE status of the buffer so that a followup read will re-read it
> + * from disk. We don't report the error otherwise to avoid warnings during log
> + * recovery and we don't get unnecssary panics on debug kernels. We use EIO here
> + * because all we want to do is say readahead failed; there is no-one to report
> + * the error to, so this will distinguish it from a non-ra verifier failure.
>   */
>  static void
>  xfs_inode_buf_verify(
> @@ -93,6 +94,7 @@ xfs_inode_buf_verify(
>  						XFS_RANDOM_ITOBP_INOTOBP))) {
>  			if (readahead) {
>  				bp->b_flags &= ~XBF_DONE;
> +				xfs_buf_ioerror(bp, -EIO);
>  				return;
>  			}
>  
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 45a8ea7..ae86b16 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -604,6 +604,13 @@ found:
>  		}
>  	}
>  
> +	/*
> +	 * Clear b_error if this is a lookup from a caller that doesn't expect
> +	 * valid data to be found in the buffer.
> +	 */
> +	if (!(flags & XBF_READ))
> +		xfs_buf_ioerror(bp, 0);
> +
>  	XFS_STATS_INC(target->bt_mount, xb_get);
>  	trace_xfs_buf_get(bp, flags, _RET_IP_);
>  	return bp;
> -- 
> 2.5.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/2] xfs: handle dquot buffer readahead in log recovery correctly
  2016-01-11  3:24 [PATCH 0/2 v3] xfs: handle dquot buffer readahead in log recovery correctly Dave Chinner
  2016-01-11  3:24 ` [PATCH 1/2] xfs: inode recovery readahead can race with inode buffer creation Dave Chinner
@ 2016-01-11  3:24 ` Dave Chinner
  1 sibling, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2016-01-11  3:24 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When we do dquot readahead in log recovery, we do not use a verifier
as the underlying buffer may not have dquots in it. e.g. the
allocation operation hasn't yet been replayed. Hence we do not want
to fail recovery because we detect an operation to be replayed has
not been run yet. This problem was addressed for inodes in commit
d891400 ("xfs: inode buffers may not be valid during recovery
readahead") but the problem was not recognised to exist for dquots
and their buffers as the dquot readahead did not have a verifier.

The result of not using a verifier is that when the buffer is then
next read to replay a dquot modification, the dquot buffer verifier
will only be attached to the buffer if *readahead is not complete*.
Hence we can read the buffer, replay the dquot changes and then add
it to the delwri submission list without it having a verifier
attached to it. This then generates warnings in xfs_buf_ioapply(),
which catches and warns about this case.

Fix this and make it handle the same readahead verifier error cases
as for inode buffers by adding a new readahead verifier that has a
write operation as well as a read operation that marks the buffer as
not done if any corruption is detected.  Also make sure we don't run
readahead if the dquot buffer has been marked as cancelled by
recovery.

This will result in readahead either succeeding and the buffer
having a valid write verifier, or readahead failing and the buffer
state requiring the subsequent read to resubmit the IO with the new
verifier.  In either case, this will result in the buffer always
ending up with a valid write verifier on it.

Note: we also need to fix the inode buffer readahead error handling
to mark the buffer with EIO. Brian noticed the code I copied from
there wrong during review, so fix it at the same time. Add comments
linking the two functions that handle readahead verifier errors
together so we don't forget this behavioural link in future.

cc: <stable@vger.kernel.org> # 3.12 - current
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/libxfs/xfs_dquot_buf.c  | 36 ++++++++++++++++++++++++++++++------
 fs/xfs/libxfs/xfs_inode_buf.c  |  2 ++
 fs/xfs/libxfs/xfs_quota_defs.h |  2 +-
 fs/xfs/libxfs/xfs_shared.h     |  1 +
 fs/xfs/xfs_log_recover.c       |  9 +++++++--
 5 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_dquot_buf.c b/fs/xfs/libxfs/xfs_dquot_buf.c
index 11cefb2..3cc3cf7 100644
--- a/fs/xfs/libxfs/xfs_dquot_buf.c
+++ b/fs/xfs/libxfs/xfs_dquot_buf.c
@@ -54,7 +54,7 @@ xfs_dqcheck(
 	xfs_dqid_t	 id,
 	uint		 type,	  /* used only when IO_dorepair is true */
 	uint		 flags,
-	char		 *str)
+	const char	 *str)
 {
 	xfs_dqblk_t	 *d = (xfs_dqblk_t *)ddq;
 	int		errs = 0;
@@ -207,7 +207,8 @@ xfs_dquot_buf_verify_crc(
 STATIC bool
 xfs_dquot_buf_verify(
 	struct xfs_mount	*mp,
-	struct xfs_buf		*bp)
+	struct xfs_buf		*bp,
+	int			warn)
 {
 	struct xfs_dqblk	*d = (struct xfs_dqblk *)bp->b_addr;
 	xfs_dqid_t		id = 0;
@@ -240,8 +241,7 @@ xfs_dquot_buf_verify(
 		if (i == 0)
 			id = be32_to_cpu(ddq->d_id);
 
-		error = xfs_dqcheck(mp, ddq, id + i, 0, XFS_QMOPT_DOWARN,
-				       "xfs_dquot_buf_verify");
+		error = xfs_dqcheck(mp, ddq, id + i, 0, warn, __func__);
 		if (error)
 			return false;
 	}
@@ -256,7 +256,7 @@ xfs_dquot_buf_read_verify(
 
 	if (!xfs_dquot_buf_verify_crc(mp, bp))
 		xfs_buf_ioerror(bp, -EFSBADCRC);
-	else if (!xfs_dquot_buf_verify(mp, bp))
+	else if (!xfs_dquot_buf_verify(mp, bp, XFS_QMOPT_DOWARN))
 		xfs_buf_ioerror(bp, -EFSCORRUPTED);
 
 	if (bp->b_error)
@@ -264,6 +264,25 @@ xfs_dquot_buf_read_verify(
 }
 
 /*
+ * readahead errors are silent and simply leave the buffer as !done so a real
+ * read will then be run with the xfs_dquot_buf_ops verifier. See
+ * xfs_inode_buf_verify() for why we use EIO and ~XBF_DONE here rather than
+ * reporting the failure.
+ */
+static void
+xfs_dquot_buf_readahead_verify(
+	struct xfs_buf	*bp)
+{
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+
+	if (!xfs_dquot_buf_verify_crc(mp, bp) ||
+	    !xfs_dquot_buf_verify(mp, bp, 0)) {
+		xfs_buf_ioerror(bp, -EIO);
+		bp->b_flags &= ~XBF_DONE;
+	}
+}
+
+/*
  * we don't calculate the CRC here as that is done when the dquot is flushed to
  * the buffer after the update is done. This ensures that the dquot in the
  * buffer always has an up-to-date CRC value.
@@ -274,7 +293,7 @@ xfs_dquot_buf_write_verify(
 {
 	struct xfs_mount	*mp = bp->b_target->bt_mount;
 
-	if (!xfs_dquot_buf_verify(mp, bp)) {
+	if (!xfs_dquot_buf_verify(mp, bp, XFS_QMOPT_DOWARN)) {
 		xfs_buf_ioerror(bp, -EFSCORRUPTED);
 		xfs_verifier_error(bp);
 		return;
@@ -287,3 +306,8 @@ const struct xfs_buf_ops xfs_dquot_buf_ops = {
 	.verify_write = xfs_dquot_buf_write_verify,
 };
 
+const struct xfs_buf_ops xfs_dquot_buf_ra_ops = {
+	.name = "xfs_dquot_ra",
+	.verify_read = xfs_dquot_buf_readahead_verify,
+	.verify_write = xfs_dquot_buf_write_verify,
+};
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index ff17c48..1aabfda 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -68,6 +68,8 @@ xfs_inobp_check(
  * recovery and we don't get unnecssary panics on debug kernels. We use EIO here
  * because all we want to do is say readahead failed; there is no-one to report
  * the error to, so this will distinguish it from a non-ra verifier failure.
+ * Changes to this readahead error behavour also need to be reflected in
+ * xfs_dquot_buf_readahead_verify().
  */
 static void
 xfs_inode_buf_verify(
diff --git a/fs/xfs/libxfs/xfs_quota_defs.h b/fs/xfs/libxfs/xfs_quota_defs.h
index 1b0a083..f51078f 100644
--- a/fs/xfs/libxfs/xfs_quota_defs.h
+++ b/fs/xfs/libxfs/xfs_quota_defs.h
@@ -153,7 +153,7 @@ typedef __uint16_t	xfs_qwarncnt_t;
 #define XFS_QMOPT_RESBLK_MASK	(XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_RES_RTBLKS)
 
 extern int xfs_dqcheck(struct xfs_mount *mp, xfs_disk_dquot_t *ddq,
-		       xfs_dqid_t id, uint type, uint flags, char *str);
+		       xfs_dqid_t id, uint type, uint flags, const char *str);
 extern int xfs_calc_dquots_per_chunk(unsigned int nbblks);
 
 #endif	/* __XFS_QUOTA_H__ */
diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h
index 5be5297..15c3ceb 100644
--- a/fs/xfs/libxfs/xfs_shared.h
+++ b/fs/xfs/libxfs/xfs_shared.h
@@ -49,6 +49,7 @@ extern const struct xfs_buf_ops xfs_inobt_buf_ops;
 extern const struct xfs_buf_ops xfs_inode_buf_ops;
 extern const struct xfs_buf_ops xfs_inode_buf_ra_ops;
 extern const struct xfs_buf_ops xfs_dquot_buf_ops;
+extern const struct xfs_buf_ops xfs_dquot_buf_ra_ops;
 extern const struct xfs_buf_ops xfs_sb_buf_ops;
 extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops;
 extern const struct xfs_buf_ops xfs_symlink_buf_ops;
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index c5ecaac..5991cdc 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3204,6 +3204,7 @@ xlog_recover_dquot_ra_pass2(
 	struct xfs_disk_dquot	*recddq;
 	struct xfs_dq_logformat	*dq_f;
 	uint			type;
+	int			len;
 
 
 	if (mp->m_qflags == 0)
@@ -3224,8 +3225,12 @@ xlog_recover_dquot_ra_pass2(
 	ASSERT(dq_f);
 	ASSERT(dq_f->qlf_len == 1);
 
-	xfs_buf_readahead(mp->m_ddev_targp, dq_f->qlf_blkno,
-			  XFS_FSB_TO_BB(mp, dq_f->qlf_len), NULL);
+	len = XFS_FSB_TO_BB(mp, dq_f->qlf_len);
+	if (xlog_peek_buffer_cancelled(log, dq_f->qlf_blkno, len, 0))
+		return;
+
+	xfs_buf_readahead(mp->m_ddev_targp, dq_f->qlf_blkno, len,
+			  &xfs_dquot_buf_ra_ops);
 }
 
 STATIC void
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-01-11 16:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-11  3:24 [PATCH 0/2 v3] xfs: handle dquot buffer readahead in log recovery correctly Dave Chinner
2016-01-11  3:24 ` [PATCH 1/2] xfs: inode recovery readahead can race with inode buffer creation Dave Chinner
2016-01-11 16:03   ` Brian Foster
2016-01-11  3:24 ` [PATCH 2/2] xfs: handle dquot buffer readahead in log recovery correctly Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox