From: Liu Bo <bo.li.liu@oracle.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH 2/4] Btrfs: fix data corruption in raid6
Date: Wed, 1 Nov 2017 18:54:03 -0600 [thread overview]
Message-ID: <20171102005405.20420-3-bo.li.liu@oracle.com> (raw)
In-Reply-To: <20171102005405.20420-1-bo.li.liu@oracle.com>
With raid6 profile, btrfs can end up with data corruption by the
following steps.
Say we have a 5 disks that are set up with raid6 profile,
1) mount this btrfs
2) one disk gets pulled out
3) write something to btrfs and sync
4) another disk gets pulled out
5) write something to btrfs and sync
6) umount this btrfs
7) bring the two disk back
8) reboot
9) mount this btrfs
Chances are mount will fail (even with -odegraded) because of failure
on reading metadata blocks, IOW, our raid6 setup is not able to
protect two disk failures in some cases.
So it turns out that there is a bug in raid6's recover code, that is,
if we have one of stripes in the raid6 layout as shown here,
| D1(*) | D2(*) | D3 | D4 | D5 |
-------------------------------------
| data1 | data2 | P | Q | data0 |
D1 and D2 are the two disks which got pulled out and brought back.
When mount reads data1 and finds out that data1 doesn't match its crc,
btrfs goes to recover data1 from other stripes, what it'll be doing is
1) read data2, parity P, parity Q and data0
2) as we have a valid parity P and two data stripes, it goes to
recover in raid5 style.
(However, since disk D2 was pulled out and data2 on D2 could be stale,
we still get the wrong data1 from this reconstruction.)
3) btrfs continues to try to reconstruct data1 from parity Q, data2
and data0, we still get the wrong one for the same reason.
The fix here is to take advantage of the device flag, ie. 'In_sync',
all data on a device might be stale if 'In_sync' has not been set.
With this, we can build the correct data2 from parity P, Q and data1.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
fs/btrfs/raid56.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 67262f8..3c0ce61 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1076,7 +1076,8 @@ static int rbio_add_io_page(struct btrfs_raid_bio *rbio,
disk_start = stripe->physical + (page_index << PAGE_SHIFT);
/* if the device is missing, just fail this stripe */
- if (!stripe->dev->bdev)
+ if (!stripe->dev->bdev ||
+ !test_bit(In_sync, &stripe->dev->flags))
return fail_rbio_index(rbio, stripe_nr);
/* see if we can add this page onto our existing bio */
--
2.9.4
next prev parent reply other threads:[~2017-11-02 1:56 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-02 0:54 [PATCH 0/4] Fix raid6 reconstruction bug Liu Bo
2017-11-02 0:54 ` [PATCH 1/4] Btrfs: introduce device flags Liu Bo
2017-11-06 16:40 ` David Sterba
2017-11-08 19:46 ` Liu Bo
2017-11-13 17:13 ` David Sterba
2017-11-07 9:30 ` Anand Jain
2017-11-02 0:54 ` Liu Bo [this message]
2017-11-07 8:32 ` [PATCH 2/4] Btrfs: fix data corruption in raid6 Anand Jain
2017-11-08 19:53 ` Liu Bo
2017-11-09 9:29 ` Anand Jain
2017-11-09 9:59 ` Qu Wenruo
2017-11-10 0:12 ` Liu Bo
2017-11-10 0:54 ` Qu Wenruo
2017-11-10 10:15 ` Qu Wenruo
2017-11-02 0:54 ` [PATCH 3/4] Btrfs: make raid1 and raid10 be aware of device flag In_sync Liu Bo
2017-11-02 0:54 ` [PATCH 4/4] Btrfs: change how we set In_sync Liu Bo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171102005405.20420-3-bo.li.liu@oracle.com \
--to=bo.li.liu@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).