[PATCH 1/2 RESEND] Btrfs: make raid6 rebuild retry more

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2 RESEND] Btrfs: make raid6 rebuild retry more
@ 2018-01-02 20:36 Liu Bo
  2018-01-02 20:36 ` [PATCH 2/2] Btrfs: fix scrub to repair raid6 corruption Liu Bo
  2018-01-05 12:51 ` [PATCH 1/2 RESEND] Btrfs: make raid6 rebuild retry more David Sterba
  0 siblings, 2 replies; 3+ messages in thread
From: Liu Bo @ 2018-01-02 20:36 UTC (permalink / raw)
  To: linux-btrfs

There is a scenario that can end up with rebuild process failing to
return good content, i.e.
suppose that all disks can be read without problems and if the content
that was read out doesn't match its checksum, currently for raid6
btrfs at most retries twice,

- the 1st retry is to rebuild with all other stripes, it'll eventually
  be a raid5 xor rebuild,
- if the 1st fails, the 2nd retry will deliberately fail parity p so
  that it will do raid6 style rebuild,

however, the chances are that another non-parity stripe content also
has something corrupted, so that the above retries are not able to
return correct content, and users will think of this as data loss.
More seriouly, if the loss happens on some important internal btree
roots, it could refuse to mount.

This extends btrfs to do more retries and each retry fails only one
stripe.  Since raid6 can tolerate 2 disk failures, if there is one
more failure besides the failure on which we're recovering, this can
always work.

The worst case is to retry as many times as the number of raid6 disks,
but given the fact that such a scenario is really rare in practice,
it's still acceptable.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/raid56.c  | 18 ++++++++++++++----
 fs/btrfs/volumes.c |  9 ++++++++-
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index b078429..c1ed7e8 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -2187,11 +2187,21 @@ int raid56_parity_recover(struct btrfs_fs_info *fs_info, struct bio *bio,
 	}
 
 	/*
-	 * reconstruct from the q stripe if they are
-	 * asking for mirror 3
+	 * Loop retry:
+	 * for 'mirror == 2', reconstruct from all other stripes.
+	 * for 'mirror_num > 2', select a stripe to fail on every retry.
 	 */
-	if (mirror_num == 3)
-		rbio->failb = rbio->real_stripes - 2;
+	if (mirror_num > 2) {
+		/*
+		 * 'mirror == 3' is to fail the p stripe and
+		 * reconstruct from the q stripe.  'mirror > 3' is to
+		 * fail a data stripe and reconstruct from p+q stripe.
+		 */
+		rbio->failb = rbio->real_stripes - (mirror_num - 1);
+		ASSERT(rbio->failb > 0);
+		if (rbio->failb <= rbio->faila)
+			rbio->failb--;
+	}
 
 	ret = lock_stripe_add(rbio);
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 49810b7..558c6c0 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5104,7 +5104,14 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
 	else if (map->type & BTRFS_BLOCK_GROUP_RAID5)
 		ret = 2;
 	else if (map->type & BTRFS_BLOCK_GROUP_RAID6)
-		ret = 3;
+		/*
+		 * There could be two corrupted data stripes, we need
+		 * to loop retry in order to rebuild the correct data.
+		 * 
+		 * Fail a stripe at a time on every retry except the
+		 * stripe under reconstruction.
+		 */
+		ret = map->num_stripes;
 	else
 		ret = 1;
 	free_extent_map(em);
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] Btrfs: fix scrub to repair raid6 corruption
  2018-01-02 20:36 [PATCH 1/2 RESEND] Btrfs: make raid6 rebuild retry more Liu Bo
@ 2018-01-02 20:36 ` Liu Bo
  2018-01-05 12:51 ` [PATCH 1/2 RESEND] Btrfs: make raid6 rebuild retry more David Sterba
  1 sibling, 0 replies; 3+ messages in thread
From: Liu Bo @ 2018-01-02 20:36 UTC (permalink / raw)
  To: linux-btrfs

The raid6 corruption is that,
suppose that all disks can be read without problems and if the content
that was read out doesn't match its checksum, currently for raid6
btrfs at most retries twice,

- the 1st retry is to rebuild with all other stripes, it'll eventually
  be a raid5 xor rebuild,
- if the 1st fails, the 2nd retry will deliberately fail parity p so
  that it will do raid6 style rebuild,

however, the chances are that another non-parity stripe content also
has something corrupted, so that the above retries are not able to
return correct content.

We've fixed normal reads to rebuild raid6 correctly with more retries
in Patch "Btrfs: make raid6 rebuild retry more"[1], this is to fix
scrub to do the exactly same rebuild process.

[1]: https://patchwork.kernel.org/patch/10091755/

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/scrub.c | 44 ++++++++++++++++++++++++++++++++------------
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 951e881..1bc3ee4 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -301,6 +301,11 @@ static void __scrub_blocked_if_needed(struct btrfs_fs_info *fs_info);
 static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info);
 static void scrub_put_ctx(struct scrub_ctx *sctx);
 
+static inline int scrub_is_page_on_raid56(struct scrub_page *page)
+{
+	return page->recover &&
+	       (page->recover->bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK);
+}
 
 static void scrub_pending_bio_inc(struct scrub_ctx *sctx)
 {
@@ -1323,15 +1328,34 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 	 * could happen otherwise that a correct page would be
 	 * overwritten by a bad one).
 	 */
-	for (mirror_index = 0;
-	     mirror_index < BTRFS_MAX_MIRRORS &&
-	     sblocks_for_recheck[mirror_index].page_count > 0;
-	     mirror_index++) {
+	for (mirror_index = 0; ;mirror_index++) {
 		struct scrub_block *sblock_other;
 
 		if (mirror_index == failed_mirror_index)
 			continue;
-		sblock_other = sblocks_for_recheck + mirror_index;
+
+		/* raid56's mirror can be more than BTRFS_MAX_MIRRORS */
+		if (!scrub_is_page_on_raid56(sblock_bad->pagev[0])) {
+			if (mirror_index >= BTRFS_MAX_MIRRORS)
+				break;
+			if (!sblocks_for_recheck[mirror_index].page_count)
+				break;
+
+			sblock_other = sblocks_for_recheck + mirror_index;
+		} else {
+			struct scrub_recover *r = sblock_bad->pagev[0]->recover;
+			int max_allowed = r->bbio->num_stripes -
+						r->bbio->num_tgtdevs;
+
+			if (mirror_index >= max_allowed)
+				break;
+			if (!sblocks_for_recheck[1].page_count)
+				break;
+
+			ASSERT(failed_mirror_index == 0);
+			sblock_other = sblocks_for_recheck + 1;
+			sblock_other->pagev[0]->mirror_num = 1 + mirror_index;
+		}
 
 		/* build and submit the bios, check checksums */
 		scrub_recheck_block(fs_info, sblock_other, 0);
@@ -1671,26 +1695,22 @@ static void scrub_bio_wait_endio(struct bio *bio)
 	complete(bio->bi_private);
 }
 
-static inline int scrub_is_page_on_raid56(struct scrub_page *page)
-{
-	return page->recover &&
-	       (page->recover->bbio->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK);
-}
-
 static int scrub_submit_raid56_bio_wait(struct btrfs_fs_info *fs_info,
 					struct bio *bio,
 					struct scrub_page *page)
 {
 	DECLARE_COMPLETION_ONSTACK(done);
 	int ret;
+	int mirror_num;
 
 	bio->bi_iter.bi_sector = page->logical >> 9;
 	bio->bi_private = &done;
 	bio->bi_end_io = scrub_bio_wait_endio;
 
+	mirror_num = page->sblock->pagev[0]->mirror_num;
 	ret = raid56_parity_recover(fs_info, bio, page->recover->bbio,
 				    page->recover->map_length,
-				    page->mirror_num, 0);
+				    mirror_num, 0);
 	if (ret)
 		return ret;
 
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/2 RESEND] Btrfs: make raid6 rebuild retry more
  2018-01-02 20:36 [PATCH 1/2 RESEND] Btrfs: make raid6 rebuild retry more Liu Bo
  2018-01-02 20:36 ` [PATCH 2/2] Btrfs: fix scrub to repair raid6 corruption Liu Bo
@ 2018-01-05 12:51 ` David Sterba
  1 sibling, 0 replies; 3+ messages in thread
From: David Sterba @ 2018-01-05 12:51 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Tue, Jan 02, 2018 at 01:36:41PM -0700, Liu Bo wrote:
> There is a scenario that can end up with rebuild process failing to
> return good content, i.e.
> suppose that all disks can be read without problems and if the content
> that was read out doesn't match its checksum, currently for raid6
> btrfs at most retries twice,
> 
> - the 1st retry is to rebuild with all other stripes, it'll eventually
>   be a raid5 xor rebuild,
> - if the 1st fails, the 2nd retry will deliberately fail parity p so
>   that it will do raid6 style rebuild,
> 
> however, the chances are that another non-parity stripe content also
> has something corrupted, so that the above retries are not able to
> return correct content, and users will think of this as data loss.
> More seriouly, if the loss happens on some important internal btree
> roots, it could refuse to mount.
> 
> This extends btrfs to do more retries and each retry fails only one
> stripe.  Since raid6 can tolerate 2 disk failures, if there is one
> more failure besides the failure on which we're recovering, this can
> always work.
> 
> The worst case is to retry as many times as the number of raid6 disks,
> but given the fact that such a scenario is really rare in practice,
> it's still acceptable.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>

1 and added to for-next.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-01-05 12:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-02 20:36 [PATCH 1/2 RESEND] Btrfs: make raid6 rebuild retry more Liu Bo
2018-01-02 20:36 ` [PATCH 2/2] Btrfs: fix scrub to repair raid6 corruption Liu Bo
2018-01-05 12:51 ` [PATCH 1/2 RESEND] Btrfs: make raid6 rebuild retry more David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).