* [PATCH 0/2] xfs_repair: fix xfs/033 issues.
@ 2014-02-20 5:55 Dave Chinner
2014-02-20 5:55 ` [PATCH 1/2] libxfs: contiguous buffers are not discontigous Dave Chinner
2014-02-20 5:55 ` [PATCH 2/2] libxfs: clear stale buffer errors on write Dave Chinner
0 siblings, 2 replies; 8+ messages in thread
From: Dave Chinner @ 2014-02-20 5:55 UTC (permalink / raw)
To: xfs
Hi folks,
Eric reported that there was a regression in v3.2.0-alpha2 where
xfs/033 was failing to repair the corrupted root inode. He also
discovered that the discontiguous buffer support in xfs_repair
fixed it. Turns out that was a regression in xfs_repair covering up
another regression in xfs_repair.
Two bugs don't make broken code better.
The first patch fixes the regression that discontiguous buffer
support introduced, re-exposing the original regression so it can be
debugged. The second patch fixes the original regression.
Cheers,
Dave.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/2] libxfs: contiguous buffers are not discontigous
2014-02-20 5:55 [PATCH 0/2] xfs_repair: fix xfs/033 issues Dave Chinner
@ 2014-02-20 5:55 ` Dave Chinner
2014-02-20 15:38 ` Christoph Hellwig
2014-02-20 19:06 ` Eric Sandeen
2014-02-20 5:55 ` [PATCH 2/2] libxfs: clear stale buffer errors on write Dave Chinner
1 sibling, 2 replies; 8+ messages in thread
From: Dave Chinner @ 2014-02-20 5:55 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
When discontiguous directory buffer support was fixed in xfs_repair,
(dd9093d xfs_repair: fix discontiguous directory block support)
it changed to using libxfs_getbuf_map() to support mapping
discontiguous blocks, and the prefetch code special cased such
discontiguous buffers.
The issue is that libxfs_getbuf_map() marks all buffers, even
contiguous ones - as LIBXFS_B_DISCONTIG, and so the prefetch code
was treating every buffer as discontiguous. This causes the prefetch
code to completely bypass the large IO optimisations for dense areas
of metadata. Because there was no obvious change in performance or
IO patterns, this wasn't noticed during performance testing.
However, this change mysteriously fixed a regression in xfs/033 in
the v3.2.0-alpha release, and this change in behaviour was
discovered as part of triaging why it "fixed" the regression.
Anyway, restoring the large IO prefetch optimisation results
a reapiron a 10 million inode filesystem dropping from 197s to 173s,
and the peak IOPS rate in phase 3 dropping from 25,000 to roughly
2,000 by trading off a bandwidth increase of roughly 100% (i.e.
200MB/s to 400MB/s). Phase 4 saw similar changes in IO profile and
speed increases.
This, however, re-introduces the regression in xfs/033, which will
now be fixed in a separate patch.
Reported-by: Eric Sandeen <esandeen@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
libxfs/rdwr.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index ac7739f..78a9b37 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -590,6 +590,10 @@ libxfs_getbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map,
struct xfs_bufkey key = {0};
int i;
+ if (nmaps == 1)
+ return libxfs_getbuf_flags(btp, map[0].bm_bn, map[0].bm_len,
+ flags);
+
key.buftarg = btp;
key.blkno = map[0].bm_bn;
for (i = 0; i < nmaps; i++) {
--
1.8.4.rc3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/2] libxfs: clear stale buffer errors on write
2014-02-20 5:55 [PATCH 0/2] xfs_repair: fix xfs/033 issues Dave Chinner
2014-02-20 5:55 ` [PATCH 1/2] libxfs: contiguous buffers are not discontigous Dave Chinner
@ 2014-02-20 5:55 ` Dave Chinner
2014-02-20 15:38 ` Christoph Hellwig
2014-02-20 19:09 ` Eric Sandeen
1 sibling, 2 replies; 8+ messages in thread
From: Dave Chinner @ 2014-02-20 5:55 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
If we've read a buffer and it's had an error (e.g a bad CRC) and the
caller corrects the problem with the buffer and writes it via
libxfs_writebuf() without clearing the error on the buffer,
subsequent reads of the buffer while it is still in cache can see
that error and fail inappropriately.
xfs/033 demonstrates this error, where phase 3 detects the corrupted
root inode and clears, but doesn't clear the b_error field. Later in
phase 6, the code that rebuilds the root directory tries to read the
root inode and sees a buffer with an error on it, thereby triggering
a fatal repair failure:
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
xfs_inode_buf_verify: XFS_CORRUPTION_ERROR
bad magic number 0x0 on inode 64
....
cleared root inode 64
....
Phase 6 - check inode connectivity...
reinitializing root directory
xfs_imap_to_bp: xfs_trans_read_buf() returned error 117.
fatal error -- could not iget root inode -- error - 117
#
Fix this by assuming buffers that are written are clean and correct
and hence we can zero the b_error field before retiring the buffer
to the cache.
Reported-by: Eric Sandeen <esandeen@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
libxfs/rdwr.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 78a9b37..d0ff15b 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -890,6 +890,11 @@ libxfs_writebufr(xfs_buf_t *bp)
int
libxfs_writebuf_int(xfs_buf_t *bp, int flags)
{
+ /*
+ * Clear any error hanging over from reading the buffer. This prevents
+ * subsequent reads after this write from seeing stale errors.
+ */
+ bp->b_error = 0;
bp->b_flags |= (LIBXFS_B_DIRTY | flags);
return 0;
}
@@ -903,6 +908,11 @@ libxfs_writebuf(xfs_buf_t *bp, int flags)
(long long)LIBXFS_BBTOOFF64(bp->b_bn),
(long long)bp->b_bn);
#endif
+ /*
+ * Clear any error hanging over from reading the buffer. This prevents
+ * subsequent reads after this write from seeing stale errors.
+ */
+ bp->b_error = 0;
bp->b_flags |= (LIBXFS_B_DIRTY | flags);
libxfs_putbuf(bp);
return 0;
--
1.8.4.rc3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] libxfs: contiguous buffers are not discontigous
2014-02-20 5:55 ` [PATCH 1/2] libxfs: contiguous buffers are not discontigous Dave Chinner
@ 2014-02-20 15:38 ` Christoph Hellwig
2014-02-20 19:06 ` Eric Sandeen
1 sibling, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2014-02-20 15:38 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] libxfs: clear stale buffer errors on write
2014-02-20 5:55 ` [PATCH 2/2] libxfs: clear stale buffer errors on write Dave Chinner
@ 2014-02-20 15:38 ` Christoph Hellwig
2014-02-20 19:09 ` Eric Sandeen
1 sibling, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2014-02-20 15:38 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] libxfs: contiguous buffers are not discontigous
2014-02-20 5:55 ` [PATCH 1/2] libxfs: contiguous buffers are not discontigous Dave Chinner
2014-02-20 15:38 ` Christoph Hellwig
@ 2014-02-20 19:06 ` Eric Sandeen
2014-02-20 21:39 ` Dave Chinner
1 sibling, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2014-02-20 19:06 UTC (permalink / raw)
To: Dave Chinner, xfs
On 2/19/14, 11:55 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> When discontiguous directory buffer support was fixed in xfs_repair,
> (dd9093d xfs_repair: fix discontiguous directory block support)
> it changed to using libxfs_getbuf_map() to support mapping
> discontiguous blocks, and the prefetch code special cased such
> discontiguous buffers.
>
> The issue is that libxfs_getbuf_map() marks all buffers, even
> contiguous ones - as LIBXFS_B_DISCONTIG, and so the prefetch code
> was treating every buffer as discontiguous. This causes the prefetch
> code to completely bypass the large IO optimisations for dense areas
> of metadata. Because there was no obvious change in performance or
> IO patterns, this wasn't noticed during performance testing.
>
> However, this change mysteriously fixed a regression in xfs/033 in
> the v3.2.0-alpha release, and this change in behaviour was
> discovered as part of triaging why it "fixed" the regression.
> Anyway, restoring the large IO prefetch optimisation results
> a reapiron a 10 million inode filesystem dropping from 197s to 173s,
> and the peak IOPS rate in phase 3 dropping from 25,000 to roughly
> 2,000 by trading off a bandwidth increase of roughly 100% (i.e.
> 200MB/s to 400MB/s). Phase 4 saw similar changes in IO profile and
> speed increases.
>
> This, however, re-introduces the regression in xfs/033, which will
> now be fixed in a separate patch.
Thanks for finding this. I was getting close. ;)
It seems fine, although a little unexpected; why do we ever
create a map of 1? It feels a little odd to call getbuf_map
with only 1 item, and then short-circuit it. Should this
be something more obvious in the callers?
Wel, I guess it's pretty much consistent w/ the same behavior
in libxfs_readbuf_map()... *shrug*
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> Reported-by: Eric Sandeen <esandeen@redhat.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> libxfs/rdwr.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
> index ac7739f..78a9b37 100644
> --- a/libxfs/rdwr.c
> +++ b/libxfs/rdwr.c
> @@ -590,6 +590,10 @@ libxfs_getbuf_map(struct xfs_buftarg *btp, struct xfs_buf_map *map,
> struct xfs_bufkey key = {0};
> int i;
>
> + if (nmaps == 1)
> + return libxfs_getbuf_flags(btp, map[0].bm_bn, map[0].bm_len,
> + flags);
> +
> key.buftarg = btp;
> key.blkno = map[0].bm_bn;
> for (i = 0; i < nmaps; i++) {
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] libxfs: clear stale buffer errors on write
2014-02-20 5:55 ` [PATCH 2/2] libxfs: clear stale buffer errors on write Dave Chinner
2014-02-20 15:38 ` Christoph Hellwig
@ 2014-02-20 19:09 ` Eric Sandeen
1 sibling, 0 replies; 8+ messages in thread
From: Eric Sandeen @ 2014-02-20 19:09 UTC (permalink / raw)
To: Dave Chinner, xfs
On 2/19/14, 11:55 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> If we've read a buffer and it's had an error (e.g a bad CRC) and the
> caller corrects the problem with the buffer and writes it via
> libxfs_writebuf() without clearing the error on the buffer,
> subsequent reads of the buffer while it is still in cache can see
> that error and fail inappropriately.
>
> xfs/033 demonstrates this error, where phase 3 detects the corrupted
> root inode and clears, but doesn't clear the b_error field. Later in
> phase 6, the code that rebuilds the root directory tries to read the
> root inode and sees a buffer with an error on it, thereby triggering
> a fatal repair failure:
>
> Phase 3 - for each AG...
> - scan and clear agi unlinked lists...
> - process known inodes and perform inode discovery...
> - agno = 0
> xfs_inode_buf_verify: XFS_CORRUPTION_ERROR
> bad magic number 0x0 on inode 64
> ....
> cleared root inode 64
> ....
> Phase 6 - check inode connectivity...
> reinitializing root directory
> xfs_imap_to_bp: xfs_trans_read_buf() returned error 117.
>
> fatal error -- could not iget root inode -- error - 117
> #
>
> Fix this by assuming buffers that are written are clean and correct
> and hence we can zero the b_error field before retiring the buffer
> to the cache.
Thanks;
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> Reported-by: Eric Sandeen <esandeen@redhat.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> libxfs/rdwr.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
> index 78a9b37..d0ff15b 100644
> --- a/libxfs/rdwr.c
> +++ b/libxfs/rdwr.c
> @@ -890,6 +890,11 @@ libxfs_writebufr(xfs_buf_t *bp)
> int
> libxfs_writebuf_int(xfs_buf_t *bp, int flags)
> {
> + /*
> + * Clear any error hanging over from reading the buffer. This prevents
> + * subsequent reads after this write from seeing stale errors.
> + */
> + bp->b_error = 0;
> bp->b_flags |= (LIBXFS_B_DIRTY | flags);
> return 0;
> }
> @@ -903,6 +908,11 @@ libxfs_writebuf(xfs_buf_t *bp, int flags)
> (long long)LIBXFS_BBTOOFF64(bp->b_bn),
> (long long)bp->b_bn);
> #endif
> + /*
> + * Clear any error hanging over from reading the buffer. This prevents
> + * subsequent reads after this write from seeing stale errors.
> + */
> + bp->b_error = 0;
> bp->b_flags |= (LIBXFS_B_DIRTY | flags);
> libxfs_putbuf(bp);
> return 0;
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] libxfs: contiguous buffers are not discontigous
2014-02-20 19:06 ` Eric Sandeen
@ 2014-02-20 21:39 ` Dave Chinner
0 siblings, 0 replies; 8+ messages in thread
From: Dave Chinner @ 2014-02-20 21:39 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
On Thu, Feb 20, 2014 at 01:06:02PM -0600, Eric Sandeen wrote:
> On 2/19/14, 11:55 PM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > When discontiguous directory buffer support was fixed in xfs_repair,
> > (dd9093d xfs_repair: fix discontiguous directory block support)
> > it changed to using libxfs_getbuf_map() to support mapping
> > discontiguous blocks, and the prefetch code special cased such
> > discontiguous buffers.
> >
> > The issue is that libxfs_getbuf_map() marks all buffers, even
> > contiguous ones - as LIBXFS_B_DISCONTIG, and so the prefetch code
> > was treating every buffer as discontiguous. This causes the prefetch
> > code to completely bypass the large IO optimisations for dense areas
> > of metadata. Because there was no obvious change in performance or
> > IO patterns, this wasn't noticed during performance testing.
> >
> > However, this change mysteriously fixed a regression in xfs/033 in
> > the v3.2.0-alpha release, and this change in behaviour was
> > discovered as part of triaging why it "fixed" the regression.
> > Anyway, restoring the large IO prefetch optimisation results
> > a reapiron a 10 million inode filesystem dropping from 197s to 173s,
> > and the peak IOPS rate in phase 3 dropping from 25,000 to roughly
> > 2,000 by trading off a bandwidth increase of roughly 100% (i.e.
> > 200MB/s to 400MB/s). Phase 4 saw similar changes in IO profile and
> > speed increases.
> >
> > This, however, re-introduces the regression in xfs/033, which will
> > now be fixed in a separate patch.
>
> Thanks for finding this. I was getting close. ;)
>
> It seems fine, although a little unexpected; why do we ever
> create a map of 1? It feels a little odd to call getbuf_map
> with only 1 item, and then short-circuit it. Should this
> be something more obvious in the callers?
Because we have application code that is building buffer maps from
extent maps, and so having a API for both
contigous and discontiguous buffers makes sense. That's the way we
do it in the kernel - everything now uses the _map paths - but the
userspace code is a much larger surface area to change over and so
it's only being done in the places that matter first...
> Wel, I guess it's pretty much consistent w/ the same behavior
> in libxfs_readbuf_map()... *shrug*
Right. And like I said, it's also how the kernel does stuff, which
is why the libxfs_readbuf_map code functions like it does. ;)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-02-20 21:39 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-20 5:55 [PATCH 0/2] xfs_repair: fix xfs/033 issues Dave Chinner
2014-02-20 5:55 ` [PATCH 1/2] libxfs: contiguous buffers are not discontigous Dave Chinner
2014-02-20 15:38 ` Christoph Hellwig
2014-02-20 19:06 ` Eric Sandeen
2014-02-20 21:39 ` Dave Chinner
2014-02-20 5:55 ` [PATCH 2/2] libxfs: clear stale buffer errors on write Dave Chinner
2014-02-20 15:38 ` Christoph Hellwig
2014-02-20 19:09 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox