* [PATCH] xfs: inode buffers may not be valid during recovery readahead
@ 2013-08-27 1:39 Dave Chinner
2013-08-30 18:15 ` Ben Myers
0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2013-08-27 1:39 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
CRC enabled filesystems fail log recovery with 100% reliability on
xfstests xfs/085 with the following failure:
XFS (vdb): Mounting Filesystem
XFS (vdb): Starting recovery (logdev: internal)
XFS (vdb): Corruption detected. Unmount and run xfs_repair
XFS (vdb): bad inode magic/vsn daddr 144 #0 (magic=0)
XFS: Assertion failed: 0, file: fs/xfs/xfs_inode_buf.c, line: 95
The problem is that the inode buffer has not been recovered before
the readahead on the inode buffer is issued. The checkpoint being
recovered actually allocates the inode chunk we are doing readahead
from, so what comes from disk during readahead is essentially
random and the verifier barfs on it.
This inode buffer readahead problem affects non-crc filesystems,
too, but xfstests does not trigger it at all on such
configurations....
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_inode_buf.c | 36 +++++++++++++++++++++++++++++++++---
fs/xfs/xfs_inode_buf.h | 1 +
fs/xfs/xfs_log_recover.c | 2 +-
3 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_inode_buf.c b/fs/xfs/xfs_inode_buf.c
index 38fe509..e011d59 100644
--- a/fs/xfs/xfs_inode_buf.c
+++ b/fs/xfs/xfs_inode_buf.c
@@ -61,9 +61,22 @@ xfs_inobp_check(
}
#endif
+/*
+ * If we are doing readahead on an inode buffer, we might be in log recovery
+ * reading an inode allocation buffer that hasn't yet been replayed, and hence
+ * has not had the inode cores stamped into it. Hence for readahead, the buffer
+ * may be potentially invalid.
+ *
+ * If the readahead buffer is invalid, we don't want to mark it with an error,
+ * but we do want to clear the DONE status of the buffer so that a followup read
+ * will re-read it from disk. This will ensure that we don't get an unnecessary
+ * warnings during log recovery and we don't get unnecssary panics on debug
+ * kernels.
+ */
static void
xfs_inode_buf_verify(
- struct xfs_buf *bp)
+ struct xfs_buf *bp,
+ bool readahead)
{
struct xfs_mount *mp = bp->b_target->bt_mount;
int i;
@@ -84,6 +97,11 @@ xfs_inode_buf_verify(
if (unlikely(XFS_TEST_ERROR(!di_ok, mp,
XFS_ERRTAG_ITOBP_INOTOBP,
XFS_RANDOM_ITOBP_INOTOBP))) {
+ if (readahead) {
+ bp->b_flags &= ~XBF_DONE;
+ return;
+ }
+
xfs_buf_ioerror(bp, EFSCORRUPTED);
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_HIGH,
mp, dip);
@@ -104,14 +122,21 @@ static void
xfs_inode_buf_read_verify(
struct xfs_buf *bp)
{
- xfs_inode_buf_verify(bp);
+ xfs_inode_buf_verify(bp, false);
+}
+
+static void
+xfs_inode_buf_readahead_verify(
+ struct xfs_buf *bp)
+{
+ xfs_inode_buf_verify(bp, true);
}
static void
xfs_inode_buf_write_verify(
struct xfs_buf *bp)
{
- xfs_inode_buf_verify(bp);
+ xfs_inode_buf_verify(bp, false);
}
const struct xfs_buf_ops xfs_inode_buf_ops = {
@@ -119,6 +144,11 @@ const struct xfs_buf_ops xfs_inode_buf_ops = {
.verify_write = xfs_inode_buf_write_verify,
};
+const struct xfs_buf_ops xfs_inode_buf_ra_ops = {
+ .verify_read = xfs_inode_buf_readahead_verify,
+ .verify_write = xfs_inode_buf_write_verify,
+};
+
/*
* This routine is called to map an inode to the buffer containing the on-disk
diff --git a/fs/xfs/xfs_inode_buf.h b/fs/xfs/xfs_inode_buf.h
index aae9fc4..599e6c0 100644
--- a/fs/xfs/xfs_inode_buf.h
+++ b/fs/xfs/xfs_inode_buf.h
@@ -48,5 +48,6 @@ void xfs_inobp_check(struct xfs_mount *, struct xfs_buf *);
#endif /* DEBUG */
extern const struct xfs_buf_ops xfs_inode_buf_ops;
+extern const struct xfs_buf_ops xfs_inode_buf_ra_ops;
#endif /* __XFS_INODE_BUF_H__ */
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 90b756f..ac9c18b 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3169,7 +3169,7 @@ xlog_recover_inode_ra_pass2(
return;
xfs_buf_readahead(mp->m_ddev_targp, ilfp->ilf_blkno,
- ilfp->ilf_len, &xfs_inode_buf_ops);
+ ilfp->ilf_len, &xfs_inode_buf_ra_ops);
}
STATIC void
--
1.8.3.2
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] xfs: inode buffers may not be valid during recovery readahead
2013-08-27 1:39 [PATCH] xfs: inode buffers may not be valid during recovery readahead Dave Chinner
@ 2013-08-30 18:15 ` Ben Myers
2013-08-31 6:14 ` Dave Chinner
0 siblings, 1 reply; 5+ messages in thread
From: Ben Myers @ 2013-08-30 18:15 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Dave,
On Tue, Aug 27, 2013 at 11:39:37AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> CRC enabled filesystems fail log recovery with 100% reliability on
> xfstests xfs/085 with the following failure:
Unfortunately I have not been able to hit this one... not sure why.
> XFS (vdb): Mounting Filesystem
> XFS (vdb): Starting recovery (logdev: internal)
> XFS (vdb): Corruption detected. Unmount and run xfs_repair
> XFS (vdb): bad inode magic/vsn daddr 144 #0 (magic=0)
> XFS: Assertion failed: 0, file: fs/xfs/xfs_inode_buf.c, line: 95
>
> The problem is that the inode buffer has not been recovered before
> the readahead on the inode buffer is issued. The checkpoint being
> recovered actually allocates the inode chunk we are doing readahead
> from, so what comes from disk during readahead is essentially
> random and the verifier barfs on it.
>
> This inode buffer readahead problem affects non-crc filesystems,
> too, but xfstests does not trigger it at all on such
> configurations....
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
I've been mulling this one over for a bit, and I'm not quite sure this
is correct:
My feeling is that in light of commit 9222a9cf, if we do take part of a
buffer back in time, the write verifier should fail. I think for a v2
inode the read and write verifiers should both be disabled for the
duration of recovery. For v3 inodes, I suspect the current situation
where we do use write verifiers is broken in the same way, at least
until we pull in 'xfs: prevent transient corrupt states during log
recovery', which, as you say, won't fix the problem for the v2 inode.
I'll pull this in and send a patch to that effect.
Reviewed-by: Ben Myers <bpm@sgi.com>
Regards,
Ben
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] xfs: inode buffers may not be valid during recovery readahead
2013-08-30 18:15 ` Ben Myers
@ 2013-08-31 6:14 ` Dave Chinner
2013-09-03 22:17 ` Ben Myers
0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2013-08-31 6:14 UTC (permalink / raw)
To: Ben Myers; +Cc: xfs
On Fri, Aug 30, 2013 at 01:15:20PM -0500, Ben Myers wrote:
> Dave,
>
> On Tue, Aug 27, 2013 at 11:39:37AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > CRC enabled filesystems fail log recovery with 100% reliability on
> > xfstests xfs/085 with the following failure:
>
> Unfortunately I have not been able to hit this one... not sure why.
>
> > XFS (vdb): Mounting Filesystem
> > XFS (vdb): Starting recovery (logdev: internal)
> > XFS (vdb): Corruption detected. Unmount and run xfs_repair
> > XFS (vdb): bad inode magic/vsn daddr 144 #0 (magic=0)
> > XFS: Assertion failed: 0, file: fs/xfs/xfs_inode_buf.c, line: 95
> >
> > The problem is that the inode buffer has not been recovered before
> > the readahead on the inode buffer is issued. The checkpoint being
> > recovered actually allocates the inode chunk we are doing readahead
> > from, so what comes from disk during readahead is essentially
> > random and the verifier barfs on it.
> >
> > This inode buffer readahead problem affects non-crc filesystems,
> > too, but xfstests does not trigger it at all on such
> > configurations....
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
>
> I've been mulling this one over for a bit, and I'm not quite sure this
> is correct:
>
> My feeling is that in light of commit 9222a9cf, if we do take part of a
> buffer back in time, the write verifier should fail.
I don't see the connection between 9222a9cf ("xfs: don't shutdown
log recovery on validation errors") and this issue. 9222a9cf works
around are a longstanding architectural deficiency of log
recovery, while this is a completely new problem introduced by the
inode buffer readahead in log recovery.
> I think for a v2
> inode the read and write verifiers should both be disabled for the
> duration of recovery.
Why? It's an architectural requirement that the underlying buffer we
are replaying inodes into already contains inodes. I mean, what's
the first thing xlog_recover_inode_pass2() does?
/*
* Make sure the place we're flushing out to really looks
* like an inode!
*/
if (unlikely(dip->di_magic != cpu_to_be16(XFS_DINODE_MAGIC))) {
xfs_alert(mp,
"%s: Bad inode magic number, dip = 0x%p, dino bp = 0x%p, ino = %Ld",
__func__, dip, bp, in_f->ilf_ino);
XFS_ERROR_REPORT("xlog_recover_inode_pass2(1)",
XFS_ERRLEVEL_LOW, mp);
error = EFSCORRUPTED;
goto out_release;
}
i.e. running the verifier on inode buffer reads during log recovery
is exactly the right thing to be doing....
Readahead doesn't change this, either. It just introduces a new
ordering issue we need to handle.
> For v3 inodes, I suspect the current situation
> where we do use write verifiers is broken in the same way, at least
> until we pull in 'xfs: prevent transient corrupt states during log
> recovery', which, as you say, won't fix the problem for the v2 inode.
Again - that fix has no connection to this readahead bug. It's the
fix for v5 filesystems that was mentioned in 9222a9cf, and is needed
regardless of whether we are doing readahead or not.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] xfs: inode buffers may not be valid during recovery readahead
2013-08-31 6:14 ` Dave Chinner
@ 2013-09-03 22:17 ` Ben Myers
2013-09-03 23:50 ` Dave Chinner
0 siblings, 1 reply; 5+ messages in thread
From: Ben Myers @ 2013-09-03 22:17 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Hi Dave,
On Sat, Aug 31, 2013 at 04:14:20PM +1000, Dave Chinner wrote:
> On Fri, Aug 30, 2013 at 01:15:20PM -0500, Ben Myers wrote:
> > Dave,
> >
> > On Tue, Aug 27, 2013 at 11:39:37AM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > >
> > > CRC enabled filesystems fail log recovery with 100% reliability on
> > > xfstests xfs/085 with the following failure:
> >
> > Unfortunately I have not been able to hit this one... not sure why.
> >
> > > XFS (vdb): Mounting Filesystem
> > > XFS (vdb): Starting recovery (logdev: internal)
> > > XFS (vdb): Corruption detected. Unmount and run xfs_repair
> > > XFS (vdb): bad inode magic/vsn daddr 144 #0 (magic=0)
> > > XFS: Assertion failed: 0, file: fs/xfs/xfs_inode_buf.c, line: 95
> > >
> > > The problem is that the inode buffer has not been recovered before
> > > the readahead on the inode buffer is issued. The checkpoint being
> > > recovered actually allocates the inode chunk we are doing readahead
> > > from, so what comes from disk during readahead is essentially
> > > random and the verifier barfs on it.
> > >
> > > This inode buffer readahead problem affects non-crc filesystems,
> > > too, but xfstests does not trigger it at all on such
> > > configurations....
> > >
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> >
> > I've been mulling this one over for a bit, and I'm not quite sure this
> > is correct:
> >
> > My feeling is that in light of commit 9222a9cf, if we do take part of a
> > buffer back in time, the write verifier should fail.
>
> I don't see the connection between 9222a9cf ("xfs: don't shutdown
> log recovery on validation errors") and this issue. 9222a9cf works
> around are a longstanding architectural deficiency of log
> recovery, while this is a completely new problem introduced by the
> inode buffer readahead in log recovery.
Commit 9222a9cf left buffer operations for inodes clear in the v2 inode case:
@@ -1845,7 +1845,13 @@ xlog_recover_do_inode_buffer(
xfs_agino_t *buffer_nextp;
trace_xfs_log_recover_buf_inode_buf(mp->m_log, buf_f);
- bp->b_ops = &xfs_inode_buf_ops;
+
+ /*
+ * Post recovery validation only works properly on CRC enabled
+ * filesystems.
+ */
+ if (xfs_sb_version_hascrc(&mp->m_sb))
+ bp->b_ops = &xfs_inode_buf_ops;
xlog_recover_commit_trans
xlog_recover_items_pass2
xlog_recover_buffer_pass2
xlog_recover_do_inode_buffer
if (xfs_sb_version_hascrc(&mp->m_sb))
bp->b_ops = &xfs_inode_buf_ops;
My concern is that with the readahead we have:
xlog_recover_commit_trans
. xlog_recover_ra_pass2
. xlog_recover_inode_ra_pass2
. xfs_buf_readahead
. xfs_buf_readahead_map
. xfs_buf_read_map
. if (!XFS_BUF_ISDONE(bp))
. bp->b_ops = ops;
xlog_recover_items_pass2
xlog_recover_buffer_pass2
xlog_recover_do_inode_buffer
if (xfs_sb_version_hascrc(&mp->m_sb))
bp->b_ops = &xfs_inode_buf_ops;
Looks like we can set b_ops in xfs_buf_read_map in the v2 inode case and it
would remain set through recovery when we intend it to be clear. If we needed
to b_ops to be clear in commit 9222a9cf, I think it should also be clear in the
readahead case.
Here's what I suggest:
---
fs/xfs/xfs_log_recover.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Index: b/fs/xfs/xfs_log_recover.c
===================================================================
--- a/fs/xfs/xfs_log_recover.c 2013-09-03 16:57:51.534388540 -0500
+++ b/fs/xfs/xfs_log_recover.c 2013-09-03 16:59:13.784398092 -0500
@@ -3309,7 +3309,9 @@ xlog_recover_inode_ra_pass2(
return;
xfs_buf_readahead(mp->m_ddev_targp, ilfp->ilf_blkno,
- ilfp->ilf_len, &xfs_inode_buf_ra_ops);
+ ilfp->ilf_len,
+ xfs_sb_version_hascrc(&mp->m_sb) ?
+ &xfs_inode_buf_ra_ops : NULL);
}
STATIC void
Thanks,
Ben
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] xfs: inode buffers may not be valid during recovery readahead
2013-09-03 22:17 ` Ben Myers
@ 2013-09-03 23:50 ` Dave Chinner
0 siblings, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2013-09-03 23:50 UTC (permalink / raw)
To: Ben Myers; +Cc: xfs
On Tue, Sep 03, 2013 at 05:17:12PM -0500, Ben Myers wrote:
> Hi Dave,
>
> On Sat, Aug 31, 2013 at 04:14:20PM +1000, Dave Chinner wrote:
> > On Fri, Aug 30, 2013 at 01:15:20PM -0500, Ben Myers wrote:
> > > Dave,
> > >
> > > On Tue, Aug 27, 2013 at 11:39:37AM +1000, Dave Chinner wrote:
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > >
> > > > CRC enabled filesystems fail log recovery with 100% reliability on
> > > > xfstests xfs/085 with the following failure:
> > >
> > > Unfortunately I have not been able to hit this one... not sure why.
> > >
> > > > XFS (vdb): Mounting Filesystem
> > > > XFS (vdb): Starting recovery (logdev: internal)
> > > > XFS (vdb): Corruption detected. Unmount and run xfs_repair
> > > > XFS (vdb): bad inode magic/vsn daddr 144 #0 (magic=0)
> > > > XFS: Assertion failed: 0, file: fs/xfs/xfs_inode_buf.c, line: 95
> > > >
> > > > The problem is that the inode buffer has not been recovered before
> > > > the readahead on the inode buffer is issued. The checkpoint being
> > > > recovered actually allocates the inode chunk we are doing readahead
> > > > from, so what comes from disk during readahead is essentially
> > > > random and the verifier barfs on it.
> > > >
> > > > This inode buffer readahead problem affects non-crc filesystems,
> > > > too, but xfstests does not trigger it at all on such
> > > > configurations....
> > > >
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > >
> > > I've been mulling this one over for a bit, and I'm not quite sure this
> > > is correct:
> > >
> > > My feeling is that in light of commit 9222a9cf, if we do take part of a
> > > buffer back in time, the write verifier should fail.
> >
> > I don't see the connection between 9222a9cf ("xfs: don't shutdown
> > log recovery on validation errors") and this issue. 9222a9cf works
> > around are a longstanding architectural deficiency of log
> > recovery, while this is a completely new problem introduced by the
> > inode buffer readahead in log recovery.
>
> Commit 9222a9cf left buffer operations for inodes clear in the v2 inode case:
....
Ok, you need to make your review comments about the code directly in
the context of the code you are commenting on. i.e. something like
this:
> return;
>
> xfs_buf_readahead(mp->m_ddev_targp, ilfp->ilf_blkno,
> - ilfp->ilf_len, &xfs_inode_buf_ops);
> + ilfp->ilf_len, &xfs_inode_buf_ra_ops);
"verifiers shouldn't be set for v4 filesystems, see
xlog_recover_do_inode_buffer()".
That would have made it obvious what you are commenting on. Indeed,
that's a flaw in the original log readahead patch, and one that I
didn't pick up when fixing the problem I sawi on v5 filesystems.
---
As a process issue, Ben, you shouldn't be committing patches that
haven't finished the review cycles. Both author and reviewer need to
be satisfied the patch that is committed is good. This patch has a
(now) obvious (to me) bug in it, but you committed it even though
you thought it was wrong and commented as such. If I hadn't of said
"I don't understand your review" even though you committed the patch
this probably would have got dropped on the ground.
So, please don't commit patches while the review process is still
running...
> Here's what I suggest:
>
> ---
> fs/xfs/xfs_log_recover.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> Index: b/fs/xfs/xfs_log_recover.c
> ===================================================================
> --- a/fs/xfs/xfs_log_recover.c 2013-09-03 16:57:51.534388540 -0500
> +++ b/fs/xfs/xfs_log_recover.c 2013-09-03 16:59:13.784398092 -0500
> @@ -3309,7 +3309,9 @@ xlog_recover_inode_ra_pass2(
> return;
>
> xfs_buf_readahead(mp->m_ddev_targp, ilfp->ilf_blkno,
> - ilfp->ilf_len, &xfs_inode_buf_ra_ops);
> + ilfp->ilf_len,
> + xfs_sb_version_hascrc(&mp->m_sb) ?
> + &xfs_inode_buf_ra_ops : NULL);
That's fine. Please post it as a full patch with a commit message
and a SOB....
Cheers,
Dave
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-09-03 23:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-27 1:39 [PATCH] xfs: inode buffers may not be valid during recovery readahead Dave Chinner
2013-08-30 18:15 ` Ben Myers
2013-08-31 6:14 ` Dave Chinner
2013-09-03 22:17 ` Ben Myers
2013-09-03 23:50 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox